How I Built an Agentic Coding CLI from Scratch
Source: Dev.to
The Core Insight: It’s Just a Loop
Every agentic coding tool—no matter how polished—follows the same fundamental pattern:
while needs_follow_up:
# 1️⃣ Send conversation + tools → LLM
# 2️⃣ If LLM returns tool calls → execute them, append results, loop
# 3️⃣ If LLM returns plain text → finish
That’s the “magic”: a while‑loop with function calling.
The remaining 95 % consists of:
- context management
- tool execution
- error handling
- permission checks
Simplified Agent Loop
def run_agent_loop(user_input, conversation, config):
conversation.add_user(user_input)
for iteration in range(config.max_iterations):
stream = completion(
model=routed_model,
messages=conversation.messages,
tools=TOOL_DEFINITIONS,
stream=True,
)
text, tool_calls, usage = process_stream(stream)
if not tool_calls:
# No tools called — model is done
conversation.add_assistant(content=text)
break
# Execute each tool, feed results back, loop
for tc in tool_calls:
result = execute_tool(tc.name, tc.args)
conversation.add_tool_result(tc.id, result)
When a user says “fix the bug in app.py”, the LLM doesn’t edit the file directly. It:
- Calls
read_file("app.py")→ receives the source. - Calls
edit_file(...)with the fix. - Calls
run_command("pytest")to verify.
Each step is a tool call that the loop executes and feeds back into the next iteration.
Architecture
┌─────────────────────────────────────────────────┐
│ cli.py (UI) │
│ REPL loop · slash commands · Rich terminal UI │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ agent.py (Brain) │
│ Agentic loop · context management · permissions│
│ │
│ LiteLLM ──→ Claude / GPT / Gemini / Ollama │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ tools.py (Hands) │
│ read_file · write_file · edit_file │
│ run_command · git_commit · search_text │
└─────────────────────────────────────────────────┘
| File | Responsibility |
|---|---|
| cli.py | Terminal UI (REPL, slash commands, session management) |
| agent.py | Brain (agentic loop, streaming, permissions, context compaction) |
| tools.py | Hands (file I/O, bash execution, git, search) |
Feature I’m Most Proud Of: Cost‑Aware Routing
Most AI coding tools lock you into a single model, making you pay the same price for a simple explanation as for a full‑scale refactor. AgentCode classifies each request by complexity and automatically selects the cheapest model that can handle it.
Routing Table
| Tier | Example Prompt | Model | Why |
|---|---|---|---|
| Light | “what does this function do” | Haiku | Fast, cheap — just reading and explaining |
| Medium | “write unit tests for app.py” | Sonnet | Needs to understand code and generate new code |
| Heavy | “refactor the entire auth system” | Opus | Multi‑file, multi‑step, architectural thinking |
The classification uses simple pattern matching:
def classify_complexity(user_input):
text = user_input.lower()
heavy_score = sum(1 for p in HEAVY_PATTERNS if re.search(p, text))
medium_score = sum(1 for p in MEDIUM_PATTERNS if re.search(p, text))
if heavy_score >= 2:
return "heavy"
elif medium_score >= 1:
return "medium"
else:
return "light"
Transparent, easy to tweak, and it saves real money.
You can always override the automatic choice with the /model command.
Streaming: The UX Difference
The first version waited for the full LLM response before showing anything, leaving the terminal blank for 5–10 seconds. Adding streaming turned the experience into a real‑time conversation.
The Challenge
In an agentic loop the LLM can return both plain text and tool calls in the same response. Text tokens arrive one‑by‑one, while tool‑call arguments arrive as fragments that must be assembled before execution.
Streaming Processor
def process_stream(stream):
full_text = ""
tool_calls_acc = {}
for chunk in stream:
delta = chunk.choices[0].delta
# Text tokens — print immediately
if delta.content:
print(delta.content, end="", flush=True)
full_text += delta.content
# Tool call fragments — accumulate silently
if delta.tool_calls:
for tc_delta in delta.tool_calls:
idx = tc_delta.index
if idx not in tool_calls_acc:
tool_calls_acc[idx] = {"id": "", "name": "", "arguments": ""}
if tc_delta.function.arguments:
tool_calls_acc[idx]["arguments"] += tc_delta.function.arguments
return full_text, tool_calls_acc
- Text streams to the screen in real time.
- Tool calls are silently assembled in the background.
The user sees words appear instantly while the agent decides what to do next.
Multi-Model Support
AgentCode uses LiteLLM as an abstraction layer. This means I write one set of tool definitions in OpenAI’s format, and LiteLLM translates them to whatever the provider expects.
Switch models mid‑conversation
❯ /model gpt-4o
✓ Switched to gpt-4o
❯ /model claude-opus-4-6
✓ Switched to claude-opus-4-6
❯ /model ollama/qwen2.5-coder
✓ Switched to ollama/qwen2.5-coder
Same tools, same loop, different brain. The local Ollama option means you can run the entire thing with zero API cost.
The Permission System
Any tool that writes files or executes commands asks before acting:
🔒 Permission Required
Tool: write_file
Args: {"path": "src/handler.py", "content": "..."}
Allow this action? [y/n] (y):
Read‑only tools (read_file, list_directory, search) auto‑approve. This keeps the flow fast while preventing the agent from doing anything destructive without your consent.
What I Learned
-
Context management is the hard problem.
The agentic loop itself is trivial. Managing what’s in the context window — compacting old messages, summarizing, keeping the right information available — is where the real engineering effort lies. -
Tool definitions matter more than the prompt.
A well‑described tool with clear parameter descriptions outperforms a clever system prompt. The LLM reads the tool schema like documentation. -
Streaming changes everything.
The difference between “wait 8 seconds for a response” and “see words appearing instantly” is the difference between a frustrating tool and one you enjoy using. -
Multi‑model flexibility is underrated.
Different models excel at different tasks. Being able to hot‑swap between them — or let the router decide — means you always have the right tool for the job.
Try It
pip install agentcode-cli
export ANTHROPIC_API_KEY="your-key"
agentcode
The codebase is readable Python — no frameworks, no abstractions. If you’re curious how agentic coding tools work, clone it and read through agent.py. The entire loop is about 50 lines.
- GitHub:
- PyPI:
MIT licensed. Feedback and contributions welcome.
Tags: python, ai, opensource, tutorial