The Promptware Kill Chain
Source: Schneier on Security
Introduction
Attacks against modern generative artificial‑intelligence (AI) large language models (LLMs) pose a real threat. Yet discussions of these attacks and their defenses are dangerously myopic. The dominant narrative focuses on “prompt injection,” a set of techniques that embed malicious instructions into inputs for an LLM. This term suggests a simple, singular vulnerability and obscures a more complex and dangerous reality.
Attacks on LLM‑based systems have evolved into a distinct class of malware‑execution mechanisms, which we term “promptware.” In a new paper, we propose a seven‑step “promptware kill chain” to give policymakers and security practitioners a common vocabulary and framework for addressing the escalating AI threat landscape.
1. Initial Access
The malicious payload first enters the AI system. This can happen:
- Directly – an attacker types a malicious prompt into the LLM application.
- Indirectly – the adversary embeds malicious instructions in content that the LLM retrieves at inference time (e.g., a web page, an email, or a shared document).
As LLMs become multimodal (processing images, audio, etc.), this vector expands further; malicious instructions can be hidden inside an image or audio file, waiting to be processed by a vision‑language model.
Why it matters
Unlike traditional computing systems that separate executable code from user data, LLMs treat all input—system commands, user emails, retrieved documents—as a single, undifferentiated token stream. There is no architectural boundary enforcing a distinction between trusted instructions and untrusted data, so a seemingly harmless document can be processed with the same authority as a system command.
2. Privilege Escalation (Jailbreaking)
After the malicious instructions are incorporated, the attacker circumvents safety training and policy guardrails built into models by vendors such as OpenAI or Google. Techniques include:
- Social‑engineering‑style prompts that convince the model to adopt a persona that ignores rules.
- Sophisticated adversarial suffixes in the prompt or data that trick the model into performing actions it would normally refuse.
This mirrors the classic escalation from a standard user account to administrator privileges, unlocking the model’s full capability for malicious use.
3. Reconnaissance
With elevated privileges, the attacker manipulates the LLM to reveal information about its assets, connected services, and capabilities. Unlike traditional malware—where reconnaissance typically precedes initial access—promptware reconnaissance occurs after initial access and jailbreaking have succeeded. Its effectiveness relies entirely on the victim model’s ability to reason over its context, turning that reasoning to the attacker’s advantage.
4. Persistence
A transient attack that disappears after one interaction is a nuisance; a persistent one compromises the LLM application for good. Persistence mechanisms include:
- Embedding malicious content into the long‑term memory of an AI agent.
- Poisoning the databases the agent relies on.
- Deploying a “worm” that infects a user’s email archive so that every time the AI summarizes past emails, the malicious code is re‑executed.
5. Command‑and‑Control (C2)
Leveraging established persistence, the attacker can dynamically fetch commands during inference from the internet. While not strictly required to advance the kill chain, this stage transforms the promptware from a static threat with fixed goals into a controllable trojan whose behavior can be modified on the fly.
6. Lateral Movement
The attack spreads from the initial victim to other users, devices, or systems. Examples:
- An infected email assistant forwards the malicious payload to all contacts, spreading like a computer virus.
- Pivoting from a compromised calendar invite to controlling smart‑home devices or exfiltrating data from a web browser.
The interconnectedness that makes AI agents useful also creates highways for malware propagation, leading to cascading failures.
7. Actions on Objective
The final stage achieves the attacker’s tangible goals, which go far beyond making a chatbot say something offensive. Possible objectives include:
- Data exfiltration or financial fraud (e.g., manipulating an AI agent to sell a car for $1 or transfer cryptocurrency to the attacker’s wallet).
- Physical‑world impact via compromised IoT or smart‑home devices.
- Code execution—agents with coding capabilities can be tricked into running arbitrary code, granting the attacker total control over the underlying system.
The outcome determines the type of malware executed by the promptware (infostealer, spyware, cryptostealer, etc.).
Summary
The promptware kill chain provides a comprehensive, seven‑stage model for understanding how malicious actors can weaponize LLMs. Recognizing each phase—from initial access through actions on objective—enables security practitioners and policymakers to develop targeted defenses and mitigate the emerging AI‑driven threat landscape.
Overview
Prompt injection attacks have evolved into sophisticated, multi‑stage campaigns that resemble traditional malware kill chains. By treating prompt‑based exploits as “promptware,” we can map their progression—from initial access to final impact—and devise defensive measures that break the chain at later stages.
Example 1: Invitation Is All You Need
| Kill‑Chain Stage | Description |
|---|---|
| Initial Access | A malicious prompt is embedded in the title of a Google Calendar invitation. |
| Persistence | Because the prompt lives inside a Calendar artifact, it remains in the long‑term memory of the user’s workspace. |
| Lateral Movement | The prompt instructs Google Assistant to launch the Zoom application. |
| Impact | The assistant covertly livestreams video of the unsuspecting user who merely asked about upcoming meetings. |
| C2 / Reconnaissance | Not demonstrated in this attack. |
Key takeaway: Embedding malicious prompts in everyday collaboration tools can give an attacker persistent footholds and enable covert data exfiltration without traditional command‑and‑control (C2) infrastructure.
Example 2: Here Comes the AI Worm
| Kill‑Chain Stage | Description |
|---|---|
| Initial Access | A prompt is injected into an email sent to the victim. |
| Persistence | The prompt persists in the long‑term memory of the user’s email workspace. |
| **Privilege |