I Built a Local-First Agent Runtime in Rust (and Why Wrapping Existing CLIs Didn’t Work)
Source: Dev.to
Why I built this
I kept seeing the same failure pattern with local 20–30B models:
- brittle tool behavior
- occasional non‑answers
- inconsistent step execution
- hard‑to‑debug failures without replayable state
The answer wasn’t just “pick a better model.”
The answer was to harden the runtime process:
- explicit safety gates
- deterministic artifacts
- policy + approvals
- eval + baseline comparisons
- replay + verification
What LocalAgent is
LocalAgent is a local‑first agent runtime CLI focused on control and reliability. It supports:
- local providers: LM Studio, Ollama, llama.cpp server
- tool calling with hard gates
- trust workflows (policy, approvals, audit)
- replayable run artifacts
- MCP stdio tool sources (including Playwright MCP)
- deterministic eval harnesses
- TUI chat mode
GitHub:
Safety defaults (important)
Defaults are intentionally restrictive:
- trust is off
- shell is disabled
- write tools are not exposed
- file‑write execution is disabled
You have to explicitly enable risky capabilities.
Architecture (high level)
At a high level, each run does:
- Build runtime context (provider/model/workdir/state/settings)
- Prepare prompt messages (session/task memory/instructions if enabled)
- Apply compaction (if configured)
- Call model (streaming or non‑streaming)
If tool calls are returned:
- run TrustGate decision first
- execute only if allowed
- normalize tool result envelope
- feed tool result back to model
- repeat until final output or exit condition
- write artifacts/events best‑effort for replay/debug
This design keeps side effects behind explicit gates and makes failures inspectable.
Why this is better than wrapper‑only trust
External wrappers are useful, but they’re limited when tool execution happens inside another runtime you don’t control.
With LocalAgent:
- tool identity/args are first‑class internal data
- policy and approvals are evaluated before side effects
- event/audit/run artifacts are generated in one execution graph
- replay and verification use the same runtime semantics
In short: security and reliability controls are part of the execution model, not bolted on.
Quickstart
cargo install --path . --force
localagent init
localagent doctor --provider lmstudio
localagent --provider lmstudio --model chat --tui
One‑shot run
localagent --provider ollama --model qwen3:8b --prompt "Summarize README.md" run
Slow hardware notes
On slower CPUs / first‑token‑heavy setups, retries can create a bad UX (re‑sent prompts before completion). During debugging, use larger timeouts and disable retries:
localagent --provider llamacpp \
--base-url http://localhost:5001/v1 \
--model default \
--http-timeout-ms 300000 \
--http-stream-idle-timeout-ms 120000 \
--http-max-retries 0 \
--prompt "..." run
What I’ve learned so far
The biggest reliability gains came from process constraints, not model hype:
- bounded tasks
- strict output expectations
- pre‑exec arg validation
- deterministic evals + baselines
- replayable artifacts for root‑cause debugging
For high‑ambiguity reasoning, I still route to stronger hosted models. For a lot of productivity helper work, local models are viable when the runtime is disciplined.
Current docs
- README: project overview + workflows
- CLI reference: complete command/flag map
- Provider setup guide: LM Studio / Ollama / llama.cpp
- Templates, policy docs, and eval docs
Repo:
Feedback I’d love
- What local model + runtime combos are most stable for tool‑calling?
- Which prompt/output constraints improved reliability most for you?
- What would make local‑first coding workflows feel “production‑ready”?
If this is useful, I can write a follow‑up with concrete eval/baseline workflows and model routing strategy.