From Compute to Congress: Setting the Global Standard for AI Security

TOC
This is some text inside of a div block.

‍The ideas presented in this post were developed for and won first place at Critical Effect DC 2025, where I presented an AI security roadmap to Congressional staffers. The event, hosted by ICS Village in partnership with the Institute for Security and Technology, Crowell LLP, and the National Security Institute, reinforced the urgency of establishing American leadership in AI evaluation standards.

Security practitioners are currently fighting an asymmetric battle: defending against nation-state AI-powered attacks with enterprise-level resources and authorities. When foreign actors systematically target our critical infrastructure through coordinated cyber campaigns—from power grids to water systems—we treat these as isolated incidents rather than recognizing them as sustained strategic attacks on American sovereignty.

If we are serious about leading the global race for AI dominance, we must stop playing defense and start setting the global standard for AI security. This means giving our security professionals the advanced tools and institutional backing they need—not just to respond to threats, but to establish American leadership in AI evaluation and deployment.

American AI systems must outperform global competition

The foundation of any superior AI system lies in the quality of its training data and evaluation processes. While our competitors rush to deploy untested AI systems, America has the opportunity to establish technological dominance through rigorous standards that will be the pacing challenge for our competitors.

Currently, most AI models are outperforming existing benchmarks, which gives us a false sense of security. As models continue to reach near-perfect scores of these benchmarks, such as Massive Multitask Language Understanding (MMLU) and Grade School Math 8K (GSM8K), we have reached what is known as benchmark saturation. Most tests and evaluations are ineffective at measuring model performance because they don’t leave any room for improvement.

AI evaluation standards need to be robust enough to keep pace with rapid technological advancements and to account for where AI systems might be vulnerable to exploitation under adversarial and operational conditions. This begins with data quality assessments. When AI systems are targeted by data poisoning or model evasion attacks, they expose vulnerabilities that hostile nations can exploit to undermine American interests.

Establishing American standards through red teaming

To achieve evaluation standards that account for data quality, America should lead through AI red teaming—where security experts systematically attack and stress-test AI systems to identify every possible weakness before deployment.

This isn't just about finding vulnerabilities; it's about proving American AI systems can withstand any attack foreign adversaries might deploy.

AI is dual use. Adversaries are leveraging it to deploy attacks at scale while security practitioners are integrating AI to strengthen cyber defenses. Now more than ever, it is paramount to understand how vulnerable our systems are to AI-driven attacks—and to do so, we must ensure our red teams have the knowledge and tools to build an effective defense posture.

Dreadnode recently released AIRTBench, an AI red teaming benchmark for evaluating language models’ ability to autonomously discover and exploit AI and machine learning security vulnerabilities. AIRTBench demonstrates a crucial reality: AI systems can be weaponized to attack each other. This capability in the wrong hands represents a fundamental shift in the threat landscape.

Developing benchmarks in close coordination with red teaming efforts requires creative, adversarial testing that reveals AI resilience against the kinds of sophisticated attacks our opponents are likely to use. We need evaluation methods that go beyond what other nations are doing by establishing rigorous testing protocols and setting an expectation for how we measure AI security.

The TEST AI Act: Making America the global standard

The TEST AI Act of 2025, introduced by the Senate, represents a decisive step toward American AI dominance. By directing NIST to create rigorous testing standards, this legislation positions the United States to set the global benchmark that other nations will be required to follow.

The Department of Energy's National Laboratories already operate seven AI testing facilities that integrate red teaming—giving America a head start in advanced AI evaluation. The TEST AI Act builds on this existing advantage to establish comprehensive testing standards that will become the international gold standard.

But standards and testing facilities alone are not enough. America must leverage its market power to ensure the resulting standards have global impact.

A collaborative path forward

The complex nature of AI development requires broad collaboration across the entire ecosystem of researchers, developers, and security professionals.

The TEST AI Act should establish two complementary initiatives: a National AI Data Quality Assurance Consortium and an AI-Enabled Red Team Task Force. These groups would bring together experts from various organizations—including AI companies, research institutions, and government agencies—to develop and refine evaluation methods.

For example, these teams could focus on detecting problems like dataset contamination or hidden biases that could lead to inaccurate outputs. Both challenges directly contribute to AI failures, including the generation of false information and degraded system performance.

Leveraging federal procurement for American advantage

The U.S. federal government is one of the world's largest technology purchasers, spending over $100 billion annually. By integrating AI evaluation standards into procurement requirements, the federal government has an opportunity to ensure that federal agencies and their critical infrastructure partners can only invest in AI systems that meet rigorous American security standards.

Just as NIST's Secure Software Development Framework and Energy Star requirements have shaped industry standards, a procurement-driven approach for American AI standards will transform evaluation methods from recommendations into market incentives for “Secure by Design” technology.

At minimum, federal contracts should require that AI systems demonstrate superior performance on American-designed evaluations and pass our red team assessments. Procurement-driven standards create instant market pressure, bypassing lengthy regulatory debates while positioning the U.S. to maintain its lead in the global race for AI dominance.

Mobilizing America's AI security ecosystem

The next three to five years are critical for America's AI future. The Test AI Act establishes a foundation, but the legislative language that follows must be handled with care. Tasking NIST to set evaluation standards is a monumental task and that will require support across AI companies, research institutions, and government agencies. While such collaboration is complex, the stakes demand nothing less than this level of unified effort.

By leveraging federal procurement power and creating operational task forces under the TEST AI Act—the National AI Data Quality Assurance Consortium and an AI-Enabled Red Team Task Force—America can turn this challenge into competitive advantage. These groups would develop the evaluation methods needed to detect dataset contamination, hidden biases, and other vulnerabilities that foreign adversaries might exploit.

Through decisive leadership in evaluation standards and strategic use of procurement power, America will establish AI security frameworks that serve our national interests. This isn't just about building better AI systems—it's about ensuring American dominance in the technology that will define the next century.