Sandboxes won't save you from OpenClaw

Published: (February 25, 2026 at 12:37 PM EST)
5 min read

Source: Hacker News

The OpenClaw Debacle (2026)

In 2026, so far, OpenClaw has:

…and that’s only after two months of operation.

The Reaction

The (tech‑adjacent) world is responding. Paranoia about misaligned AI is moving semi‑mainstream.

  • X and LinkedIn are awash in prompt‑injection stories and not‑so‑subtle company ads masquerading as warnings.
  • Arguments about rogue intelligence are no longer dismissed with an eye‑roll.
  • People see agents burning crypto, deleting inboxes, and they start looking for solutions.

One solution that keeps popping up: sandboxes.

Sandboxes – A Brief Primer

Sandboxes aren’t new. They’re an application of virtualization, which dates back to IBM’s mainframes in the late‑1960s. The core objective has remained the same:

Sandboxes isolate workloads from each other while providing each workload a full‑machine abstraction.

The Current Trend

The trending “workload” today is an AI agent. The logic goes:

  1. Run the agent in a sandbox.
  2. If the sandbox doesn’t “leak,” the agent can’t delete files, read a crypto wallet, or clear an inbox.
  3. Result: I’m safe.

The Reality Check

You’re not safe.

  • None of the incidents above involved direct filesystem access.
  • Every major issue involved a third‑party service that the user explicitly granted the agent access to.
  • The agent was prompt‑injected or mis‑interpreted its own instructions, then performed the unwanted action.
  • No sandbox can prevent this.

Sandboxes are great for isolating workloads, but agents primarily need to be isolated from you. The only protections a sandbox offers here are:

  • Filesystem protections – stop rm -rf /.
  • Network protections – limit which websites the agent can reach.

Both are useful, but far from sufficient for safety.

The Core Tension

There’s an inherent tension between:

  • The usefulness of a general‑purpose agent (e.g., OpenClaw).
  • The restrictions a secure deployment would require.
Desired CapabilitySecurity Conflict
Access to accounts (e.g., calendar, email)Giving the agent account access opens the door to misuse.
Access to money (e.g., ordering groceries)Allowing credit‑card use enables unauthorized purchases.

People envision OpenClaw as an early real‑life Jarvis—the personal assistant from Iron Man that runs most of Tony Stark’s life. They want it to:

  • Book flights.
  • Negotiate rent.
  • Handle auto‑insurance claims.

Capability exists. Preventing hijacking does not.

What the Market Actually Needs: Agentic Permissions

What we need isn’t another sandbox; it’s a granular permissions framework for agents.

Goal: Grant an agent a limited degree of latitude per account.
Example:

  • Connect a credit card, but allow
  • Connect email, but only allow sending/replying to a few pre‑approved addresses, with user approval for each message.

Current State: OAuth

OAuth was designed for human users. Its permission granularity is far too coarse:

  • Gmail: “Send emails” (single permission).
  • GitHub: “Make pull requests” (single permission).
  • Payments: Essentially nothing—we rely on the goodwill (and legal risk) of e‑commerce platforms.

Agents need much finer‑grained controls.

Concrete Permission Designs

Gmail Integration

  1. Contact‑level pre‑approval: Users walk through their contacts and set permissions per address:
    • Send without approval
    • Require approval
  2. Queue system: Messages that require approval sit in a queue. The user manually approves them, which then triggers a callback to the agent.

Credit‑Card Limits

  1. Never expose the actual card number to the agent.
  2. Per‑purchase token: The agent requests a single‑use credit‑card number for each transaction.
  3. Policy enforcement: The token only authorizes transactions of a specific size and from a specific merchant.
  4. User mediation: Every token request must be approved by the user, ensuring the agent never sees the real card number or can reuse a prior approval.

This pattern can be extended to any product we want to connect to an agent. The key takeaway: Agents are a fundamentally new type of actor, requiring new interfaces.

Why This Doesn’t Exist Yet

  • Diverse permission models across surfaces (email, finance, social media, etc.).
  • Hard to build middleware that enforces a unified model across all products.
  • Requires industry‑wide standards or consortium‑driven APIs.

The Plaid Analogy

What the market needs is the next Plaid—a unified API that wrangles disparate operators into a single, coherent permission layer.

  • Finance is the logical first battleground: the sheer amount of money at stake makes it a prime candidate for early adoption.

Bottom Line

We do not need another agent sandbox.
What we need is a robust, fine‑grained permission system—something akin to a “Seatbelt” for AI agents, ensuring they can act usefully while staying safely constrained.

[Seatbelt](https://eapplewiki.com/wiki/Dev:Seatbelt), [bubblewrap](https://github.com/containers/bubblewrap), or [landlock](https://docs.kernel.org/userspace-api/landlock.html), and move on. It's not enough, but neither is anything else.

:::note If you’re building an agent in today’s guardrail‑free world, then reach out to us at Tachyon to audit it for vulnerabilities. :::

0 views
Back to Blog

Related posts

Read more »