Don't trust AI agents
Source: Hacker News
Building with AI Agents: Assume They’re Untrusted
When you’re building with AI agents, they should be treated as untrusted and potentially malicious. Whether you’re worried about:
- Prompt injection
- A model trying to escape its sandbox
- Or something nobody’s thought of yet
…regardless of your threat model, you shouldn’t be trusting the agent.
The Right Approach
The solution isn’t:
- Better permission checks
- Smarter allow‑lists
Instead, design architecture that assumes agents will misbehave and contains the damage when they do.
That’s the principle I built NanoClaw on.
Don’t Trust the Process
OpenClaw runs directly on the host machine by default. It offers an opt‑in Docker sandbox mode, but this mode is disabled out of the box, and most users never enable it. Consequently, security relies entirely on application‑level checks such as:
- Allowlists
- Confirmation prompts
- A predefined set of “safe” commands
These checks assume implicit trust that the agent will not act maliciously. If you adopt the mindset that an agent could be hostile, it becomes clear that application‑level blocks are insufficient—they do not provide hermetic security. A determined or compromised agent can find ways to bypass them.
NanoClaw’s Approach: Container Isolation
In NanoClaw, container isolation is a core architectural principle:
| Feature | Description |
|---|---|
| Per‑agent containers | Each agent runs in its own Docker (or Apple Container on macOS) instance. |
| Ephemeral lifecycle | Containers are created fresh for each invocation and destroyed afterward. |
| Unprivileged execution | The agent runs as a non‑root user inside the container. |
| Explicit mounts only | The container can see only the directories that are explicitly mounted. |
| OS‑enforced boundaries | The container boundary is enforced by the operating system, providing strong isolation. |
By leveraging these container guarantees, NanoClaw ensures that even a malicious or compromised agent cannot escape its sandbox or affect the host system.
Don’t Trust Other Agents
Even when OpenClaw’s sandbox is enabled, all agents share the same container. You might have a personal‑assistant agent, a work agent, a family‑group agent, etc., each operating in different WhatsApp groups or Telegram channels. Because they run in the same environment, information can leak between agents that are supposed to access different data.
Why Per‑Agent Isolation Matters
In NanoClaw each agent gets:
- its own container
- a dedicated filesystem (
/data/) - an independent Claude session history
Thus, your personal assistant cannot see the work agent’s data, and vice‑versa.
Comparison: Shared vs. Per‑Agent Containers
| Feature | Shared Container | Per‑Agent Containers |
|---|---|---|
| Filesystem | Single shared FS | Separate /data/ directories |
| Credentials | All credentials accessible | Each agent sees only its own data |
| Session histories | All visible | Each agent has its own session |
| Mounted data | All data shared | Mounts are scoped per agent |
| Isolation | None – agents see everything | Agents are isolated from each other |
| Example layout | ||
| Personal Assistant | /data/personal (ro) | /data/personal (ro) |
| Work Agent | /data/work (rw) | /data/work (rw) |
| Family Group Agent | /data/family (ro) | /data/family (ro) |
The container boundary is the hard security layer – an agent cannot escape it regardless of configuration.
Defense‑in‑Depth: Mount Allowlist
A mount allowlist located at
~/.config/nanoclaw/mount-allowlist.json
provides an additional safeguard:
- Purpose: Prevent the user from accidentally mounting sensitive paths, not to stop an agent from breaking out.
- Defaults: Sensitive directories/files such as
.ssh,.gnupg,.aws,.env,private_key,credentialsare blocked. - Location: The allowlist lives outside the project directory, so a compromised agent cannot modify its own permissions.
- Host code: The host application code is mounted read‑only, ensuring nothing an agent does can persist after the container is destroyed.
Trust Model for Group Chats
- Non‑main groups are untrusted by default.
- Members of other groups cannot:
- Send messages to chats they don’t belong to
- Schedule tasks for other groups
- View data belonging to other groups
Since anyone in a group could attempt a prompt‑injection attack, the security model assumes the worst‑case scenario and isolates groups accordingly.
Don’t Trust What You Can’t Read
OpenClaw contains nearly half a million lines of code, 53 configuration files, and more than 70 dependencies. This scale breaks the basic premise of open‑source security.
- Chromium has 35 + million lines, yet we trust Google’s review processes.
- Most open‑source projects stay small enough that many eyes can actually review them.
Nobody has reviewed OpenClaw’s 400 k lines. It was written in weeks with no proper review process. Complexity is where vulnerabilities hide, and Microsoft’s analysis confirms this: OpenClaw’s risks can emerge through normal API calls because no single person can see the full picture.
NanoClaw: Small, Auditable, and Extensible

- Size – One process and a handful of files (~3 k lines).
- Dependencies – Relies heavily on Anthropic’s Agent SDK (the wrapper around Claude Code) for session management, memory compaction, etc., instead of reinventing the wheel.
- Reviewability – A competent developer can audit the entire codebase in an afternoon. This is a deliberate constraint, not a limitation (philosophy).
Our contribution guidelines accept only:
- Bug fixes
- Security fixes
- Simplifications
Skills‑Based Extensibility
New functionality arrives as skills: instructions with a full, working reference implementation that a coding agent merges into your codebase.
- You review exactly what code will be added before it lands.
- Only the integrations you actually need are added.
- Every installation ends up as a few thousand lines of code, tailored to the owner’s exact requirements.
With a monolithic 400 k‑line codebase, even if you enable only two integrations, the rest of the code remains loaded, part of the attack surface, and reachable by prompt injections or rogue agents. You cannot disentangle what’s active from what’s dormant, nor audit it because the boundary of “your code” is undefined.
With skills, the boundary is obvious: a few thousand lines you chose to add, all of which you can read. The core is actually getting smaller over time—for example, WhatsApp support is being extracted and packaged as a skill.
Design for Distrust
If a hallucination or a misbehaving agent can cause a security issue, then the security model is broken. Security must be enforced outside the agentic surface; it cannot rely on the agent behaving correctly.
- Containers, mount restrictions, and filesystem isolation exist so that, even when an agent does something unexpected, the blast radius is contained.
Key Takeaways
- Risk remains – an AI agent with access to your data is inherently high‑risk.
- Narrow the trust surface – make the agent’s permissions as limited and as verifiable as possible.
- Don’t trust the agent – build walls around it.
Further Reading
- NanoClaw’s source code
- NanoClaw’s full security model – short enough to read in an afternoon.