I Built a Local-First Agent Runtime in Rust (and Why Wrapping Existing CLIs Didn’t Work)

Published: (February 21, 2026 at 05:59 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Why I built this

I kept seeing the same failure pattern with local 20–30B models:

  • brittle tool behavior
  • occasional non‑answers
  • inconsistent step execution
  • hard‑to‑debug failures without replayable state

The answer wasn’t just “pick a better model.”
The answer was to harden the runtime process:

  • explicit safety gates
  • deterministic artifacts
  • policy + approvals
  • eval + baseline comparisons
  • replay + verification

What LocalAgent is

LocalAgent is a local‑first agent runtime CLI focused on control and reliability. It supports:

  • local providers: LM Studio, Ollama, llama.cpp server
  • tool calling with hard gates
  • trust workflows (policy, approvals, audit)
  • replayable run artifacts
  • MCP stdio tool sources (including Playwright MCP)
  • deterministic eval harnesses
  • TUI chat mode

GitHub:

Safety defaults (important)

Defaults are intentionally restrictive:

  • trust is off
  • shell is disabled
  • write tools are not exposed
  • file‑write execution is disabled

You have to explicitly enable risky capabilities.

Architecture (high level)

At a high level, each run does:

  1. Build runtime context (provider/model/workdir/state/settings)
  2. Prepare prompt messages (session/task memory/instructions if enabled)
  3. Apply compaction (if configured)
  4. Call model (streaming or non‑streaming)

If tool calls are returned:

  • run TrustGate decision first
  • execute only if allowed
  • normalize tool result envelope
  • feed tool result back to model
  • repeat until final output or exit condition
  • write artifacts/events best‑effort for replay/debug

This design keeps side effects behind explicit gates and makes failures inspectable.

Why this is better than wrapper‑only trust

External wrappers are useful, but they’re limited when tool execution happens inside another runtime you don’t control.

With LocalAgent:

  • tool identity/args are first‑class internal data
  • policy and approvals are evaluated before side effects
  • event/audit/run artifacts are generated in one execution graph
  • replay and verification use the same runtime semantics

In short: security and reliability controls are part of the execution model, not bolted on.

Quickstart

cargo install --path . --force
localagent init
localagent doctor --provider lmstudio
localagent --provider lmstudio --model  chat --tui

One‑shot run

localagent --provider ollama --model qwen3:8b --prompt "Summarize README.md" run

Slow hardware notes

On slower CPUs / first‑token‑heavy setups, retries can create a bad UX (re‑sent prompts before completion). During debugging, use larger timeouts and disable retries:

localagent --provider llamacpp \
  --base-url http://localhost:5001/v1 \
  --model default \
  --http-timeout-ms 300000 \
  --http-stream-idle-timeout-ms 120000 \
  --http-max-retries 0 \
  --prompt "..." run

What I’ve learned so far

The biggest reliability gains came from process constraints, not model hype:

  • bounded tasks
  • strict output expectations
  • pre‑exec arg validation
  • deterministic evals + baselines
  • replayable artifacts for root‑cause debugging

For high‑ambiguity reasoning, I still route to stronger hosted models. For a lot of productivity helper work, local models are viable when the runtime is disciplined.

Current docs

  • README: project overview + workflows
  • CLI reference: complete command/flag map
  • Provider setup guide: LM Studio / Ollama / llama.cpp
  • Templates, policy docs, and eval docs

Repo:

Feedback I’d love

  • What local model + runtime combos are most stable for tool‑calling?
  • Which prompt/output constraints improved reliability most for you?
  • What would make local‑first coding workflows feel “production‑ready”?

If this is useful, I can write a follow‑up with concrete eval/baseline workflows and model routing strategy.

0 views
Back to Blog

Related posts

Read more »