I Built a Local-First Agent Runtime in Rust (and Why Wrapping Existing CLIs Didn’t Work)

Published: 2 months ago (February 21, 2026 at 05:59 PM EST)

3 min read

Source: Dev.to

Source: Dev.to

Why I built this

I kept seeing the same failure pattern with local 20–30B models:

brittle tool behavior
occasional non‑answers
inconsistent step execution
hard‑to‑debug failures without replayable state

The answer wasn’t just “pick a better model.”
The answer was to harden the runtime process:

explicit safety gates
deterministic artifacts
policy + approvals
eval + baseline comparisons
replay + verification

What LocalAgent is

LocalAgent is a local‑first agent runtime CLI focused on control and reliability. It supports:

local providers: LM Studio, Ollama, llama.cpp server
tool calling with hard gates
trust workflows (policy, approvals, audit)
replayable run artifacts
MCP stdio tool sources (including Playwright MCP)
deterministic eval harnesses
TUI chat mode

GitHub:

Safety defaults (important)

Defaults are intentionally restrictive:

trust is off
shell is disabled
write tools are not exposed
file‑write execution is disabled

You have to explicitly enable risky capabilities.

Architecture (high level)

At a high level, each run does:

Build runtime context (provider/model/workdir/state/settings)
Prepare prompt messages (session/task memory/instructions if enabled)
Apply compaction (if configured)
Call model (streaming or non‑streaming)

If tool calls are returned:

run TrustGate decision first
execute only if allowed
normalize tool result envelope
feed tool result back to model
repeat until final output or exit condition
write artifacts/events best‑effort for replay/debug

This design keeps side effects behind explicit gates and makes failures inspectable.

Why this is better than wrapper‑only trust

External wrappers are useful, but they’re limited when tool execution happens inside another runtime you don’t control.

With LocalAgent:

tool identity/args are first‑class internal data
policy and approvals are evaluated before side effects
event/audit/run artifacts are generated in one execution graph
replay and verification use the same runtime semantics

In short: security and reliability controls are part of the execution model, not bolted on.

Quickstart

cargo install --path . --force
localagent init
localagent doctor --provider lmstudio
localagent --provider lmstudio --model  chat --tui

One‑shot run

localagent --provider ollama --model qwen3:8b --prompt "Summarize README.md" run

Slow hardware notes

On slower CPUs / first‑token‑heavy setups, retries can create a bad UX (re‑sent prompts before completion). During debugging, use larger timeouts and disable retries:

localagent --provider llamacpp \
  --base-url http://localhost:5001/v1 \
  --model default \
  --http-timeout-ms 300000 \
  --http-stream-idle-timeout-ms 120000 \
  --http-max-retries 0 \
  --prompt "..." run

What I’ve learned so far

The biggest reliability gains came from process constraints, not model hype:

bounded tasks
strict output expectations
pre‑exec arg validation
deterministic evals + baselines
replayable artifacts for root‑cause debugging

For high‑ambiguity reasoning, I still route to stronger hosted models. For a lot of productivity helper work, local models are viable when the runtime is disciplined.

Current docs

README: project overview + workflows
CLI reference: complete command/flag map
Provider setup guide: LM Studio / Ollama / llama.cpp
Templates, policy docs, and eval docs

Repo:

Feedback I’d love

What local model + runtime combos are most stable for tool‑calling?
Which prompt/output constraints improved reliability most for you?
What would make local‑first coding workflows feel “production‑ready”?

If this is useful, I can write a follow‑up with concrete eval/baseline workflows and model routing strategy.