Trump plan to test AI models has a problem—US security teams were gutted by DOGE

Published: 1 day ago (June 3, 2026 at 02:11 PM EDT)

2 min read

Source: Ars Technica

Observability and Transparency Challenges

Nguyen warned that the effectiveness of safety testing will likely depend on whether AI firms are fully transparent and treat the process as a “genuine collaboration.” He noted:

“Underneath the definitional problem sits an observability problem. The government cannot assess what it cannot see, and frontier capabilities are visible only to the labs that build them.”

Rapidly Evolving Threat Landscape

Ferren highlighted the narrow window for establishing robust cyber defenses against new AI models:

“The window for erecting proper cyber defenses to new AI models may also close quickly,” and even a well‑designed government program may struggle to properly vet frontier models in such a short timeframe.

He added that pre‑deployment testing has inherent limits, citing Google’s threat intelligence team’s findings that state‑aligned actors are using frontier models to automate cyberattacks. Researchers have demonstrated that “Mythos‑style vulnerability reasoning can be reproduced with open‑weight systems” (see research).

Incentives and Practical Constraints

While AI developers might voluntarily submit to testing, financial motivations could lead them to seek a rubber‑stamp rather than fully cooperate with the government to test frontier capabilities comprehensively.

“It will likely prove difficult to develop models that are incapable of malicious hacking yet remain commercially compelling,” Ferren said.

Potential Short‑Term Benefits and Long‑Term Uncertainty

Nguyen concluded that the executive order (EO) “may yield short‑term cybersecurity benefits,” but the “long‑term effect” remains “unclear.”

Recommendations for Ongoing Evaluation

Nguyen suggested the EO incorporate several key steps:

Classified cyber benchmarking
Voluntary prerelease evaluation
Coordinated vulnerability scanning

These measures, he argued, are essential for the national security community to “continuously evaluate systems that are probabilistic rather than deterministic, autonomous rather than directed, and whose capabilities change with every update.”

Need for Adaptive Safety Testing

The safety testing regime must evolve as quickly as the technology itself; otherwise, assessments risk targeting “yesterday’s risks.” Nguyen emphasized that the process hinges on an honest exchange between stakeholders possessing deep technical expertise and confidential national security insights. This collaborative approach is the only way to ensure the U.S. focuses on protecting the public from the most credible and consequential AI risks, rather than offering merely “performative reassurances.”

Trump plan to test AI models has a problem—US security teams were gutted by DOGE

Observability and Transparency Challenges

Rapidly Evolving Threat Landscape

Incentives and Practical Constraints

Potential Short‑Term Benefits and Long‑Term Uncertainty

Recommendations for Ongoing Evaluation

Need for Adaptive Safety Testing

Related posts

AI needs a 'brake pedal', warns Anthropic co-founder

에임인텔리전스, 10개국 문화·법률 반영 AI 안전성 벤치마크 ‘XL-SafetyBench’ 공개

Wired found code for an unreleased facial recognition feature in Meta's AI app

ChatGPT's memory is getting better, especially if you're on the free tier