Anthropic and OpenAI just exposed SAST's structural blind spot with free tools
Source: VentureBeat
OpenAI & Anthropic Release Reasoning‑Based Vulnerability Scanners
Key takeaway: Both labs have shown that traditional static application security testing (SAST) tools are structurally blind to entire vulnerability classes. The enterprise security stack is now caught in the middle.
Why This Matters
- Competitive pressure: Anthropic and OpenAI together represent a private‑market valuation of > $1.1 trillion. Their rivalry will push detection quality forward faster than any single vendor could achieve alone.
- No replacement, but a shift: Neither Claude Code Security nor Codex Security replaces an existing stack, but both change procurement math permanently.
- Current pricing: Both scanners are free for enterprise customers (as of this writing).
What you need before the board asks:
- A head‑to‑head comparison of the two scanners.
- Seven concrete actions for your security team (see the list at the end of this note).
How Anthropic and OpenAI Reached the Same Conclusion from Different Architectures
Anthropic – Claude Code Security
| Date | Event |
|---|---|
| Feb 5 | Anthropic published zero‑day research alongside Claude Opus 4.6. The model uncovered > 500 high‑severity, previously unknown vulnerabilities in production open‑source codebases that had survived decades of expert review and millions of hours of fuzzing. |
| Feb 20 | Claude Code Security launched as a limited research preview (Enterprise & Team customers, free expedited access for open‑source maintainers). |
| Key finding | In the CGIF library, Claude reasoned about the LZW compression algorithm and discovered a heap buffer overflow that coverage‑guided fuzzing missed even with 100 % code coverage. |
| Quote | “We built Claude Code Security to make defensive capabilities more widely available,” – Gabby Curtis, Anthropic communications lead (VentureBeat). |
OpenAI – Codex Security
| Date | Event |
|---|---|
| 2025 (private beta) | Aardvark, an internal GPT‑5‑powered tool, entered beta. |
| Mar 6 | OpenAI launched Codex Security (evolved from Aardvark). |
| Beta results | Scanned > 1.2 M commits across external repos, surfacing 792 critical and 10,561 high‑severity findings. |
| Vulnerabilities disclosed | OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, Chromium → 14 assigned CVEs. |
| Quality improvements | False‑positive rate fell > 50 %; over‑reported severity dropped > 90 % during beta. |
| Quote | “The competitive scanner race compresses the window for everyone,” – Merritt Baer, CSO, Enkrypt AI (VentureBeat). |
Independent Validation
- Checkmarx Zero tested Claude Code Security on a production‑grade codebase:
- 8 vulnerabilities flagged → 2 true positives.
- Moderately complex obfuscation can defeat the scanner, suggesting a lower detection ceiling than headline numbers.
- No third‑party audit: Neither vendor has submitted detection claims to an independent auditor. Treat reported numbers as indicative, not audited.
Implications for Security Leaders
- Pattern‑matching SAST has a hard ceiling – LLM reasoning pushes detection beyond it.
- Dual‑use risk: If these labs can find the bugs, adversaries with API access can too.
- Zero‑day mindset: Open‑source vulnerabilities surfaced by reasoning models should be treated as near‑zero‑day discoveries, not backlog items.
- Prioritization shift: Focus on exploitability in the runtime context rather than CVSS scores alone (Baer’s advice).
- SBOM visibility is essential to instantly know where a vulnerable component runs.
What Vendor Responses Prove
| Vendor | Position | Key Points |
|---|---|---|
| Snyk | Acknowledges breakthrough but stresses remediation at scale as the bottleneck. | • AI‑generated code is 2.74 × more likely to introduce vulnerabilities (Veracode 2025 GenAI Code Security Report). |
| • Finding bugs is easy; fixing them without breaking anything is hard. | ||
| Cycode (CTO Ronen Slavin) | Recognizes technical advance but warns about probabilistic nature of AI models. | • Need consistent, reproducible, audit‑grade results. |
| • Scanning inside an IDE is useful but does not replace infrastructure‑level security (governance, pipeline integrity, runtime behavior). | ||
| Merritt Baer (Enkrypt AI) | “If code‑reasoning scanners from major AI labs are effectively free to enterprise customers, then static code scanning commoditizes overnight.” | • Emphasizes the compressed window between discovery and exploitation. |
| • Urges security teams to shorten discovery → triage → patch cycles. |
Seven Actions for Your Security Team
- Run a pilot of both Claude Code Security and Codex Security on a representative subset of your codebase.
- Benchmark detection against your existing SAST tools (true‑positive rate, false‑positive rate, coverage).
- Validate findings with an independent manual review or a third‑party audit service.
- Integrate SBOM tooling to map discovered vulnerabilities to runtime assets instantly.
- Prioritize patches by exploitability in context (runtime exposure, asset criticality) rather than CVSS alone.
- Establish a remediation workflow that can handle high‑velocity findings without breaking CI/CD pipelines (e.g., staged rollouts, canary releases).
- Monitor vendor roadmaps and track false‑positive trends as the models evolve; adjust procurement strategy accordingly.
Bottom Line
- Reasoning‑based scanners are a game‑changer – they expose vulnerability classes that traditional SAST cannot see.
- Free, high‑quality tools from two trillion‑dollar labs will rapidly become a baseline expectation.
- Your organization must adapt: pilot, validate, prioritize, and automate remediation now, before the board asks which scanner you’re piloting and why.
VentureBeat – Over the next 12 months, Baer expects the budget to move toward three areas:
- Runtime and exploitability layers – runtime protection and attack‑path analysis.
- AI governance and model security – guardrails, prompt‑injection defenses, and agent oversight.
- Remediation automation – “The net effect is that AppSec spending probably doesn’t shrink, but the center of gravity shifts away from traditional SAST licenses and toward tooling that shortens remediation cycles,” Baer said.
Seven things to do before your next board meeting
1. Run both scanners against a representative code‑base subset
- Compare Claude Code Security and Codex Security findings against your existing SAST output.
- Start with a single representative repository, not the entire codebase.
- Both tools are in research preview with access constraints that make full‑estate scanning premature.
- The delta is your blind‑spot inventory.
2. Build the governance framework before the pilot, not after
- Treat either tool like a new data processor for your crown‑jewels (source code).
- Baer’s governance model includes:
- A formal data‑processing agreement with clear statements on training exclusion, data retention, and sub‑processor use.
- A segmented submission pipeline so only the repos you intend to scan are transmitted.
- An internal classification policy that distinguishes code that can leave your boundary from code that cannot.
- VentureBeat’s interviews with >40 CISOs found that formal governance frameworks for reasoning‑based scanning tools barely exist.
- Blind‑spot: derived IP – can model providers retain embeddings or reasoning traces, and are those artifacts your IP?
- Data residency: code historically wasn’t regulated like customer data, but it’s increasingly subject to export‑control and national‑security review.
3. Map what neither tool covers
- Software composition analysis
- Container scanning
- Infrastructure‑as‑code scanning
- DAST
- Runtime detection & response
Claude Code Security and Codex Security operate at the code‑reasoning layer; your existing stack handles everything else. That stack’s pricing power is what shifted.
4. Quantify the dual‑use exposure
- Every zero‑day from Anthropic and OpenAI lives in an open‑source project that enterprise apps depend on.
- Labs disclose and patch responsibly, but the window between discovery and your adoption of those patches is exactly where attackers operate.
- AI‑security startup AISLE independently discovered all 12 zero‑day vulnerabilities in OpenSSL’s January 2026 security patch, including a stack‑buffer overflow (CVE‑2025‑15467) that is potentially remotely exploitable without valid key material.
- Assume adversaries are running the same models against the same codebases.
5. Prepare the board comparison before they ask
| Feature | Claude Code Security | Codex Security |
|---|---|---|
| Reasoning | Contextual code reasoning, data‑flow tracing, multi‑stage self‑verification | Project‑specific threat model, validation in sandboxed environments |
| Status | Research preview; human approval required before patching | Research preview; human approval required before patching |
| Board‑level framing | “Pattern‑matching SAST solved a different generation of problems; it was designed to detect known anti‑patterns.” | “Reasoning models can evaluate multi‑file logic, state transitions, and developer intent—where many modern bugs live.” |
- The board needs a side‑by‑side analysis, not a single‑vendor pitch.
- Baer’s board‑ready summary: “We bought the right tools for the threats of the last decade; the technology just advanced.”
6. Track the competitive cycle
- Both companies are heading toward IPOs; enterprise security wins drive the growth narrative.
- When one scanner misses a blind spot, it lands on the other lab’s feature roadmap within weeks.
- Both labs ship model updates on monthly cycles, outrunning any single vendor’s release calendar.
- Baer: “Different models reason differently, and the delta between them can reveal bugs neither tool alone would consistently catch. In the short term, using both isn’t redundancy. It’s defense through diversity of reasoning systems.”
7. Set a 30‑day pilot window
- Before February 20 this test did not exist.
- Run Claude Code Security and Codex Security against the same codebase and let the delta drive the procurement conversation with empirical data instead of vendor marketing.
- 30 days gives you that data.
Note: Fourteen days separated Anthropic and OpenAI releases. The gap between the next releases will be shorter, and attackers are watching the same calendar.