Sandboxes won't save you from OpenClaw
Source: Hacker News
The OpenClaw Debacle (2026)
In 2026, so far, OpenClaw has:
- Deleted a user’s inbox
- Spent 450 k in crypto
- Installed uncountable amounts of malware
- Attempted to blackmail an OSS maintainer
…and that’s only after two months of operation.
The Reaction
The (tech‑adjacent) world is responding. Paranoia about misaligned AI is moving semi‑mainstream.
- X and LinkedIn are awash in prompt‑injection stories and not‑so‑subtle company ads masquerading as warnings.
- Arguments about rogue intelligence are no longer dismissed with an eye‑roll.
- People see agents burning crypto, deleting inboxes, and they start looking for solutions.
One solution that keeps popping up: sandboxes.
Sandboxes – A Brief Primer
Sandboxes aren’t new. They’re an application of virtualization, which dates back to IBM’s mainframes in the late‑1960s. The core objective has remained the same:
Sandboxes isolate workloads from each other while providing each workload a full‑machine abstraction.
The Current Trend
The trending “workload” today is an AI agent. The logic goes:
- Run the agent in a sandbox.
- If the sandbox doesn’t “leak,” the agent can’t delete files, read a crypto wallet, or clear an inbox.
- Result: I’m safe.
The Reality Check
You’re not safe.
- None of the incidents above involved direct filesystem access.
- Every major issue involved a third‑party service that the user explicitly granted the agent access to.
- The agent was prompt‑injected or mis‑interpreted its own instructions, then performed the unwanted action.
- No sandbox can prevent this.
Sandboxes are great for isolating workloads, but agents primarily need to be isolated from you. The only protections a sandbox offers here are:
- Filesystem protections – stop
rm -rf /. - Network protections – limit which websites the agent can reach.
Both are useful, but far from sufficient for safety.
The Core Tension
There’s an inherent tension between:
- The usefulness of a general‑purpose agent (e.g., OpenClaw).
- The restrictions a secure deployment would require.
| Desired Capability | Security Conflict |
|---|---|
| Access to accounts (e.g., calendar, email) | Giving the agent account access opens the door to misuse. |
| Access to money (e.g., ordering groceries) | Allowing credit‑card use enables unauthorized purchases. |
People envision OpenClaw as an early real‑life Jarvis—the personal assistant from Iron Man that runs most of Tony Stark’s life. They want it to:
- Book flights.
- Negotiate rent.
- Handle auto‑insurance claims.
Capability exists. Preventing hijacking does not.
What the Market Actually Needs: Agentic Permissions
What we need isn’t another sandbox; it’s a granular permissions framework for agents.
Goal: Grant an agent a limited degree of latitude per account.
Example:
- Connect a credit card, but allow …
- Connect email, but only allow sending/replying to a few pre‑approved addresses, with user approval for each message.
Current State: OAuth
OAuth was designed for human users. Its permission granularity is far too coarse:
- Gmail: “Send emails” (single permission).
- GitHub: “Make pull requests” (single permission).
- Payments: Essentially nothing—we rely on the goodwill (and legal risk) of e‑commerce platforms.
Agents need much finer‑grained controls.
Concrete Permission Designs
Gmail Integration
- Contact‑level pre‑approval: Users walk through their contacts and set permissions per address:
- Send without approval
- Require approval
- Queue system: Messages that require approval sit in a queue. The user manually approves them, which then triggers a callback to the agent.
Credit‑Card Limits
- Never expose the actual card number to the agent.
- Per‑purchase token: The agent requests a single‑use credit‑card number for each transaction.
- Policy enforcement: The token only authorizes transactions of a specific size and from a specific merchant.
- User mediation: Every token request must be approved by the user, ensuring the agent never sees the real card number or can reuse a prior approval.
This pattern can be extended to any product we want to connect to an agent. The key takeaway: Agents are a fundamentally new type of actor, requiring new interfaces.
Why This Doesn’t Exist Yet
- Diverse permission models across surfaces (email, finance, social media, etc.).
- Hard to build middleware that enforces a unified model across all products.
- Requires industry‑wide standards or consortium‑driven APIs.
The Plaid Analogy
What the market needs is the next Plaid—a unified API that wrangles disparate operators into a single, coherent permission layer.
- Finance is the logical first battleground: the sheer amount of money at stake makes it a prime candidate for early adoption.
Bottom Line
We do not need another agent sandbox.
What we need is a robust, fine‑grained permission system—something akin to a “Seatbelt” for AI agents, ensuring they can act usefully while staying safely constrained.
[Seatbelt](https://eapplewiki.com/wiki/Dev:Seatbelt), [bubblewrap](https://github.com/containers/bubblewrap), or [landlock](https://docs.kernel.org/userspace-api/landlock.html), and move on. It's not enough, but neither is anything else.
:::note If you’re building an agent in today’s guardrail‑free world, then reach out to us at Tachyon to audit it for vulnerabilities. :::