We Tested Agentic AI Against 525 Real Attacks. Here's What We Found.

Published: 1 month ago (March 13, 2026 at 12:14 AM EDT)

4 min read

Source: Dev.to

Source: Dev.to

Introduction

We ran the numbers. The threat is real.

For the past several months, we’ve been building and validating Cerberus — an open‑source runtime security harness for agentic AI systems. It is designed around a specific threat model we call the Lethal Trifecta: the simultaneous convergence, within a single AI execution turn, of privileged data access, untrusted content injection, and an outbound exfiltration path.

We just finished our first formal validation run: 525 attack trials across three major AI providers. Below are the key findings.

Attack Success Rates

Full injection compliance – agent fully redirected to attacker’s address

Model	Success Rate	95 % CI	Causation Score
GPT‑4o‑mini	90.3 %	84.8 % – 93.9 %	0.811
Gemini 2.5 Flash	82.4 %	75.9 % – 87.5 %	0.702
Claude Sonnet	6.7 %	3.8 % – 11.5 %	0.207

Control group: 0/30 exfiltrations across all providers (clean baseline).
Statistical significance: Fisher’s exact test, OpenAI p — “This is not a theoretical vulnerability. At a 90 % success rate, the Lethal Trifecta is a reliable attack primitive against current production AI systems.”

What Is the Lethal Trifecta?

The attack chain requires three conditions to align within a single execution turn:

Privileged data access – the agent can see sensitive operational or financial data.
Untrusted content injection – the agent processes external input (e.g., a vendor document, an invoice, a client email, a compliance filing).
Outbound exfiltration path – the agent has authority to take downstream action.

Why It Matters

Supply‑chain environments: AI agents orchestrate procurement workflows, read vendor submissions, cross‑reference inventory systems, and trigger purchase orders.
Financial services: Agents summarize client portfolios, process compliance documents, and triage transactions.

Both architectures deliver all three conditions by design, making the Lethal Trifecta the default configuration of agentic AI in these industries. Our data shows that two of the three major model providers are vulnerable at rates above 80 %.

What Cerberus Detected — and Where the Gaps Are

Cerberus operates in three detection layers. The validation ran in observe‑only mode—agents behaved normally, and Cerberus recorded observations without intervening.

Detection Layer Results (N = 525)

Layer	Description	Detection Rate
L1	Data Source Monitoring	100 % (97.9 % – 100 %)
L2	Token Provenance Tracking	100 % (97.9 % – 100 %)
L3	Outbound Intent Detection	13.7 % (OpenAI) / 1.1 % (Anthropic) / 65.7 % (Google)

Overall detection rate: 28.5 % (24.7 % – 32.6 %).
False‑positive rate: 0.0 % (0.0 % – 11.4 %) — zero false triggers across 30 clean control runs.

Per‑Category Detection (All Providers Combined)

Direct Injection: 37.8 % (28.5 % – 48.1 %)
Encoded/Obfuscated: 37.5 % (27.2 % – 49.0 %)
Multi‑turn: 33.3 % (22.9 % – 45.6 %)
Multilingual: 33.3 % (22.9 % – 45.6 %)
Advanced Technique: 20.0 % (14.1 % – 27.5 %)
Social Engineering: 15.3 % (8.8 % – 25.3 %)

The L3 detection gap is a known limitation and the current focus of active development. L1 and L2 coverage are production‑ready; L3 is where the adversarial arms race is happening.

Zero Performance Overhead

p50 latency: 52 µs per session
p99 latency: 0.23 ms per session
Overhead: 0.01 % of typical LLM latency (~2 s)

Against a typical LLM response time of ~2 seconds, Cerberus adds negligible overhead, removing any performance argument against deployment.

Implications for Supply‑Chain and Financial Services

If your agentic AI deployment uses GPT‑4o‑mini or Gemini and processes external documents (vendor submissions, invoices, client communications, compliance filings), the Lethal Trifecta succeeds at a rate above 80 %.

The critical question is not whether the attack is possible, but whether you have a runtime layer that can detect when all three trifecta conditions are active in a single execution turn. Most deployments today lack such visibility.

Getting Started with Cerberus

GitHub:
npm package: @cerberus-ai/core (signed provenance)
Demo:
Company site:

Tags: #AgenticAI #SupplyChain #FinancialServices #CyberSecurity #RuntimeSecurity #PromptInjection #OpenSource #Cerberus #SixSense #LLMSecurity #RedTeam

We Tested Agentic AI Against 525 Real Attacks. Here's What We Found.

Introduction

Attack Success Rates

What Is the Lethal Trifecta?

Why It Matters

What Cerberus Detected — and Where the Gaps Are

Detection Layer Results (N = 525)

Per‑Category Detection (All Providers Combined)

Zero Performance Overhead

Implications for Supply‑Chain and Financial Services

Getting Started with Cerberus

Related posts

Why Open Source AI Tools Are Quietly Winning

Travigo

Trust Debt: The Production Crisis Hidden Inside AI-Generated Codebases

Micro games

Introduction

Attack Success Rates

What Is the Lethal Trifecta?

Why It Matters

What Cerberus Detected — and Where the Gaps Are

Detection Layer Results (N = 525)

Per‑Category Detection (All Providers Combined)

Zero Performance Overhead

Implications for Supply‑Chain and Financial Services

Getting Started with Cerberus

Related posts

Why Open Source AI Tools Are Quietly Winning

Travigo

Trust Debt: The Production Crisis Hidden Inside AI-Generated Codebases

Micro games

Detection Layer Results (N = 525)