Why Your AI Agents Need a Shell (And How to Give Them One Safely)

Published: 0 month ago (January 11, 2026 at 05:45 AM EST)

7 min read

Source: Dev.to

Claude Code — A New Take on Agent Architecture

Claude Code changed how I think about agent architecture. It outperforms agents loaded up with 50 different MCP servers and custom integrations, and under the hood it’s remarkably simple:

No tool sprawl.
No massive schema definitions eating up context.
Just a filesystem and Bash.

The AI community is starting to piece together why this works so well, and many of us are reconsidering the complexity we’ve been bolting onto our own agents.

The Two “Default” Approaches

When building agents that need to work with data, most of us default to one of these three patterns:

Prompt stuffing – throw everything into the context window and hope for the best.
Tool libraries – connect MCP servers, define custom tools, give the agent ways to fetch what it needs.
Vector search – embed your data, run semantic similarity, pray the retrieval is relevant.

These approaches work, but each has trade‑offs:

Approach	Pros	Cons
Prompt stuffing	Simple, no extra infrastructure	Hits token limits fast
Vector search	Great for semantic similarity	Struggles with exact values from structured data
Tool libraries	Solves capability problem	Every new tool adds schema definitions, more options for the model to reason about, and a larger surface area for failure

The Third Option – Filesystem + Bash

Why it makes sense

Training data – LLMs were trained on billions of lines of code, including countless examples of developers navigating directories, grepping files, and managing state across complex codebases.
Native primitives – Commands like grep, cat, find, and awk are part of the models’ “native language”, not something we need to teach them.

The Vercel team discovered this when rebuilding their internal agents. By replacing most custom tooling with just two things—a filesystem tool and a Bash tool—they:

Cut the cost of their sales‑call summarization agent from ≈ $1.00 per call to ≈ $0.25 on Claude Opus.
Improved output quality.

What a Bash‑enabled agent can do

Connect to anything – curl APIs, CLI tools for databases, cloud services, Kubernetes, etc.
Store & retrieve its own context – write findings to a file, pause, come back later; the filesystem becomes working memory.
Precise retrieval – grep -r "pricing objection" transcripts/ returns exact matches, no similarity scores.
Natural data hierarchy – customer records, ticket history, CRM exports map cleanly to directories.
Full debuggability – you can see every file read, command run, and output written.

Intuition: If an agent can navigate a codebase to find bugs, it can navigate your business data the same way. If it can run shell commands, it can interact with almost any system you already use.

Security – Sandboxing the Bash Access

Giving an AI agent unrestricted Bash access is terrifying. One hallucinated rm -rf and you’re in trouble. The solution is sandboxing:

Run the agent’s commands in an isolated environment that cannot touch production systems.
Separate what the agent can think about (the mounted directory) from what it can actually do (the sandboxed executor).

Architecture Overview

Agent receives task
        ↓
Explores filesystem (ls, find)
        ↓
Searches for relevant content (grep, cat)
        ↓
Sends context + request to LLM
        ↓
Returns structured output

The Bash execution runs in an isolated sandbox, giving you the power of native filesystem operations without the risk.

Sandbox Requirements

Requirement	Description
Process isolation	Commands run in a contained environment that cannot break out to the host. WebAssembly runtimes are ideal because they provide memory‑safe execution by design.
Directory mounting	Expose only the directories the agent needs (e.g., mount `/project` as `/workspace`). Everything else simply doesn’t exist from the agent’s perspective.
Session persistence	For multi‑step workflows, keep configuration and state across commands without bleeding into other sessions.
Visible execution paths	Capture `stdout`, `stderr`, and full command history so you can audit exactly what happened.

The security model is defense in depth: even if the agent generates unexpected commands, the sandbox constrains what can actually happen.

A Quick Note on MCP (Model Context Protocol)

If you’ve been following the AI tooling space, you’ve probably heard of MCP. Anthropic released it in late 2024, and it quickly became the de‑facto standard for connecting agents to external tools and data.

Before MCP: each pairing (e.g., GitHub ↔ agent, Slack ↔ agent, DB ↔ agent) required a custom integration.
After MCP: you build one MCP server for each tool and one MCP client for each agent—turning an N × M integration problem into an N + M problem.

TL;DR

Use the filesystem + Bash as a universal, native toolset for agents.
Sandbox the Bash execution to keep production safe.
Leverage MCP for any remaining external‑tool connections you need.

This approach strips away tool sprawl, reduces cost, and gives you a transparent, debuggable, and powerful agent architecture.

Why CLI Tools Over MCP?

When MCP usage scales, a serious problem appears: tool definitions eat up the LLM’s context window. Each tool brings a description, parameters, and return schemas. Connect to dozens of MCP servers with hundreds of tools and you burn tokens before the agent even starts working.

The Unix Philosophy Wins

Small, modular, composable – CLI tools take text in, give text out, and can be chained together.
Battle‑tested – git, grep, curl, jq have been in production for decades.
Stable & well‑documented – Their interfaces rarely change.

LLMs have seen these tools billions of times during training. They already know the syntax, flags, common patterns, and error messages. This isn’t “in‑context learning”; it’s deep, internalised knowledge.

MCP vs. CLI

Aspect	MCP	CLI
Learning curve	Agent must learn each tool from the supplied schema (bolted‑on capability).	Agent already knows the tool (native understanding).
Composability	Not truly pipe‑able; chaining outputs to inputs is clunky.	Simple pipelines: `grep "error" logs.txt \| wc -l` – models know how to construct them.
Token efficiency	High – each tool definition adds tokens.	Low – only the command text is needed.
Security	Requires custom sandboxing per server.	Needs sandboxing for arbitrary shell access (the real challenge).

MCP is great for distribution and discoverability, especially for non‑technical users or highly domain‑specific workflows. But for raw capability—giving an agent the power to actually do things—CLI tools are superior: they’re more token‑efficient, composable, and align with what models already excel at.

The Core Problem: Safe CLI Access

Allowing an agent to run arbitrary shell commands is dangerous. We need a sandbox that:

Is portable across macOS, Linux, and Windows.
Provides strong isolation (WASM or micro‑VM).
Is easy to use from the command line.

Introducing Bashlet

Bashlet is an open‑source tool that gives AI agents sandboxed Bash access.
It supports multiple isolation back‑ends depending on your platform and security needs.

Backend	Characteristics
Wasmer (WASM)	Cross‑platform, lightweight sandbox. Works on macOS, Linux, Windows. Startup ≈ 50 ms.
Firecracker (microVM)	Full Linux VM isolation for hardware‑level security. Linux‑only, requires KVM. Boots ≈ 125 ms.

By default, Bashlet auto‑selects the best available backend.

Linux + KVM → Firecracker VM.
All other platforms → Wasmer WASM sandbox.

Basic Workflow

# 1️⃣ Create a session with a mounted directory
bashlet create --name demo --mount ./src:/workspace

# 2️⃣ Run commands in isolation
bashlet run demo "ls /workspace"
bashlet run demo "grep -r 'TODO' /workspace"

# 3️⃣ Terminate when done
bashlet terminate demo

One‑off Commands (no session management)

bashlet exec --mount ./src:/workspace "ls /workspace"

Presets: Stop Re‑Repeating Setup

Many workflows require the same mounts, environment variables, and setup commands. Define them once in ~/.config/bashlet/config.toml:

[presets.kubectl]
mounts = [
  ["/usr/local/bin/kubectl", "/usr/local/bin/kubectl", true],
  ["~/.kube", "/home/.kube", true]
]
env_vars = [["KUBECONFIG", "/home/.kube/config"]]
setup_commands = ["kubectl version --client"]

[presets.nodejs]
mounts = [["~/.npm", "/home/.npm", false]]
env_vars = [["NODE_ENV", "development"]]
workdir = "/app"

Using Presets

# Create a session with a preset
bashlet create --name k8s-env --preset kubectl

# One‑shot command with a preset
bashlet exec --preset kubectl "kubectl get pods"

# Auto‑create session if missing, apply preset, then run
bashlet run dev -C --preset nodejs "npm install"

Backend‑Specific Presets

Want a workload always to run in Firecracker? Add backend (and optionally a custom rootfs_image) to the preset:

[presets.dev-vm]
backend = "firecracker"
rootfs_image = "~/.bashlet/images/dev.ext4"
env_vars = [["EDITOR", "vim"]]

Changes to the rootfs persist across sessions—install packages once and reuse them forever.

Installation

Bashlet is available on GitHub. A single script installs it (including Wasmer, and Firecracker on Linux) in ~30 seconds:

curl -fsSL https://bashlet.dev/install.sh | sh

The Bigger Picture

As LLMs improve at coding, agents built on filesystem primitives automatically get better too. By leveraging the tools the models were trained on—rather than fighting custom tooling that requires constant maintenance—we can keep agent architectures simple, fast, and secure.

Give your agents the tools they were trained on.

Why Your AI Agents Need a Shell (And How to Give Them One Safely)

Claude Code — A New Take on Agent Architecture

The Two “Default” Approaches

The Third Option – Filesystem + Bash

Why it makes sense

What a Bash‑enabled agent can do

Security – Sandboxing the Bash Access

Architecture Overview

Sandbox Requirements

A Quick Note on MCP (Model Context Protocol)

TL;DR

Why CLI Tools Over MCP?

The Unix Philosophy Wins

MCP vs. CLI

The Core Problem: Safe CLI Access

Introducing Bashlet

Basic Workflow

One‑off Commands (no session management)

Presets: Stop Re‑Repeating Setup

Using Presets

Backend‑Specific Presets

Installation

The Bigger Picture

Related posts

What are AI agent skills and how to use them - complete breakdown with examples

The `/context` Command: X-Ray Vision for Your Tokens

Show HN: What if AI agents had Zodiac personalities?

The Mute Agent: Why Your AI Needs to Shut Up and Listen to the Graph

Claude Code — A New Take on Agent Architecture

The Two “Default” Approaches

The Third Option – Filesystem + Bash

Why it makes sense

What a Bash‑enabled agent can do

Security – Sandboxing the Bash Access

Architecture Overview

Sandbox Requirements

A Quick Note on MCP (Model Context Protocol)

TL;DR

Why CLI Tools Over MCP?

The Unix Philosophy Wins

MCP vs. CLI

The Core Problem: Safe CLI Access

Introducing Bashlet

Basic Workflow

One‑off Commands (no session management)

Presets: Stop Re‑Repeating Setup

Using Presets

Backend‑Specific Presets

Installation

The Bigger Picture

Related posts

What are AI agent skills and how to use them - complete breakdown with examples

The `/context` Command: X-Ray Vision for Your Tokens

Show HN: What if AI agents had Zodiac personalities?

The Mute Agent: Why Your AI Needs to Shut Up and Listen to the Graph

Claude Code — A New Take on Agent Architecture

The Third Option – Filesystem + Bash