Why Your AI Agents Need a Shell (And How to Give Them One Safely)
Source: Dev.to
Claude Code — A New Take on Agent Architecture
Claude Code changed how I think about agent architecture. It outperforms agents loaded up with 50 different MCP servers and custom integrations, and under the hood it’s remarkably simple:
- No tool sprawl.
- No massive schema definitions eating up context.
- Just a filesystem and Bash.
The AI community is starting to piece together why this works so well, and many of us are reconsidering the complexity we’ve been bolting onto our own agents.
The Two “Default” Approaches
When building agents that need to work with data, most of us default to one of these three patterns:
- Prompt stuffing – throw everything into the context window and hope for the best.
- Tool libraries – connect MCP servers, define custom tools, give the agent ways to fetch what it needs.
- Vector search – embed your data, run semantic similarity, pray the retrieval is relevant.
These approaches work, but each has trade‑offs:
| Approach | Pros | Cons |
|---|---|---|
| Prompt stuffing | Simple, no extra infrastructure | Hits token limits fast |
| Vector search | Great for semantic similarity | Struggles with exact values from structured data |
| Tool libraries | Solves capability problem | Every new tool adds schema definitions, more options for the model to reason about, and a larger surface area for failure |
The Third Option – Filesystem + Bash
Why it makes sense
- Training data – LLMs were trained on billions of lines of code, including countless examples of developers navigating directories, grepping files, and managing state across complex codebases.
- Native primitives – Commands like
grep,cat,find, andawkare part of the models’ “native language”, not something we need to teach them.
The Vercel team discovered this when rebuilding their internal agents. By replacing most custom tooling with just two things—a filesystem tool and a Bash tool—they:
- Cut the cost of their sales‑call summarization agent from ≈ $1.00 per call to ≈ $0.25 on Claude Opus.
- Improved output quality.
What a Bash‑enabled agent can do
- Connect to anything –
curlAPIs, CLI tools for databases, cloud services, Kubernetes, etc. - Store & retrieve its own context – write findings to a file, pause, come back later; the filesystem becomes working memory.
- Precise retrieval –
grep -r "pricing objection" transcripts/returns exact matches, no similarity scores. - Natural data hierarchy – customer records, ticket history, CRM exports map cleanly to directories.
- Full debuggability – you can see every file read, command run, and output written.
Intuition: If an agent can navigate a codebase to find bugs, it can navigate your business data the same way. If it can run shell commands, it can interact with almost any system you already use.
Security – Sandboxing the Bash Access
Giving an AI agent unrestricted Bash access is terrifying. One hallucinated rm -rf and you’re in trouble. The solution is sandboxing:
- Run the agent’s commands in an isolated environment that cannot touch production systems.
- Separate what the agent can think about (the mounted directory) from what it can actually do (the sandboxed executor).
Architecture Overview
Agent receives task
↓
Explores filesystem (ls, find)
↓
Searches for relevant content (grep, cat)
↓
Sends context + request to LLM
↓
Returns structured output
The Bash execution runs in an isolated sandbox, giving you the power of native filesystem operations without the risk.
Sandbox Requirements
| Requirement | Description |
|---|---|
| Process isolation | Commands run in a contained environment that cannot break out to the host. WebAssembly runtimes are ideal because they provide memory‑safe execution by design. |
| Directory mounting | Expose only the directories the agent needs (e.g., mount /project as /workspace). Everything else simply doesn’t exist from the agent’s perspective. |
| Session persistence | For multi‑step workflows, keep configuration and state across commands without bleeding into other sessions. |
| Visible execution paths | Capture stdout, stderr, and full command history so you can audit exactly what happened. |
The security model is defense in depth: even if the agent generates unexpected commands, the sandbox constrains what can actually happen.
A Quick Note on MCP (Model Context Protocol)
If you’ve been following the AI tooling space, you’ve probably heard of MCP. Anthropic released it in late 2024, and it quickly became the de‑facto standard for connecting agents to external tools and data.
- Before MCP: each pairing (e.g., GitHub ↔ agent, Slack ↔ agent, DB ↔ agent) required a custom integration.
- After MCP: you build one MCP server for each tool and one MCP client for each agent—turning an N × M integration problem into an N + M problem.
TL;DR
- Use the filesystem + Bash as a universal, native toolset for agents.
- Sandbox the Bash execution to keep production safe.
- Leverage MCP for any remaining external‑tool connections you need.
This approach strips away tool sprawl, reduces cost, and gives you a transparent, debuggable, and powerful agent architecture.
Why CLI Tools Over MCP?
When MCP usage scales, a serious problem appears: tool definitions eat up the LLM’s context window. Each tool brings a description, parameters, and return schemas. Connect to dozens of MCP servers with hundreds of tools and you burn tokens before the agent even starts working.
The Unix Philosophy Wins
- Small, modular, composable – CLI tools take text in, give text out, and can be chained together.
- Battle‑tested –
git,grep,curl,jqhave been in production for decades. - Stable & well‑documented – Their interfaces rarely change.
LLMs have seen these tools billions of times during training. They already know the syntax, flags, common patterns, and error messages. This isn’t “in‑context learning”; it’s deep, internalised knowledge.
MCP vs. CLI
| Aspect | MCP | CLI |
|---|---|---|
| Learning curve | Agent must learn each tool from the supplied schema (bolted‑on capability). | Agent already knows the tool (native understanding). |
| Composability | Not truly pipe‑able; chaining outputs to inputs is clunky. | Simple pipelines: grep "error" logs.txt | wc -l – models know how to construct them. |
| Token efficiency | High – each tool definition adds tokens. | Low – only the command text is needed. |
| Security | Requires custom sandboxing per server. | Needs sandboxing for arbitrary shell access (the real challenge). |
MCP is great for distribution and discoverability, especially for non‑technical users or highly domain‑specific workflows. But for raw capability—giving an agent the power to actually do things—CLI tools are superior: they’re more token‑efficient, composable, and align with what models already excel at.
The Core Problem: Safe CLI Access
Allowing an agent to run arbitrary shell commands is dangerous. We need a sandbox that:
- Is portable across macOS, Linux, and Windows.
- Provides strong isolation (WASM or micro‑VM).
- Is easy to use from the command line.
Introducing Bashlet
Bashlet is an open‑source tool that gives AI agents sandboxed Bash access.
It supports multiple isolation back‑ends depending on your platform and security needs.
| Backend | Characteristics |
|---|---|
| Wasmer (WASM) | Cross‑platform, lightweight sandbox. Works on macOS, Linux, Windows. Startup ≈ 50 ms. |
| Firecracker (microVM) | Full Linux VM isolation for hardware‑level security. Linux‑only, requires KVM. Boots ≈ 125 ms. |
By default, Bashlet auto‑selects the best available backend.
- Linux + KVM → Firecracker VM.
- All other platforms → Wasmer WASM sandbox.
Basic Workflow
# 1️⃣ Create a session with a mounted directory
bashlet create --name demo --mount ./src:/workspace
# 2️⃣ Run commands in isolation
bashlet run demo "ls /workspace"
bashlet run demo "grep -r 'TODO' /workspace"
# 3️⃣ Terminate when done
bashlet terminate demo
One‑off Commands (no session management)
bashlet exec --mount ./src:/workspace "ls /workspace"
Presets: Stop Re‑Repeating Setup
Many workflows require the same mounts, environment variables, and setup commands. Define them once in ~/.config/bashlet/config.toml:
[presets.kubectl]
mounts = [
["/usr/local/bin/kubectl", "/usr/local/bin/kubectl", true],
["~/.kube", "/home/.kube", true]
]
env_vars = [["KUBECONFIG", "/home/.kube/config"]]
setup_commands = ["kubectl version --client"]
[presets.nodejs]
mounts = [["~/.npm", "/home/.npm", false]]
env_vars = [["NODE_ENV", "development"]]
workdir = "/app"
Using Presets
# Create a session with a preset
bashlet create --name k8s-env --preset kubectl
# One‑shot command with a preset
bashlet exec --preset kubectl "kubectl get pods"
# Auto‑create session if missing, apply preset, then run
bashlet run dev -C --preset nodejs "npm install"
Backend‑Specific Presets
Want a workload always to run in Firecracker? Add backend (and optionally a custom rootfs_image) to the preset:
[presets.dev-vm]
backend = "firecracker"
rootfs_image = "~/.bashlet/images/dev.ext4"
env_vars = [["EDITOR", "vim"]]
Changes to the rootfs persist across sessions—install packages once and reuse them forever.
Installation
Bashlet is available on GitHub. A single script installs it (including Wasmer, and Firecracker on Linux) in ~30 seconds:
curl -fsSL https://bashlet.dev/install.sh | sh
The Bigger Picture
As LLMs improve at coding, agents built on filesystem primitives automatically get better too. By leveraging the tools the models were trained on—rather than fighting custom tooling that requires constant maintenance—we can keep agent architectures simple, fast, and secure.
Give your agents the tools they were trained on.