How I Built an Agentic Coding CLI from Scratch

Published: 18 hours ago (May 3, 2026 at 09:22 PM EDT)

5 min read

Source: Dev.to

The Core Insight: It’s Just a Loop

Every agentic coding tool—no matter how polished—follows the same fundamental pattern:

while needs_follow_up:
    # 1️⃣ Send conversation + tools → LLM
    # 2️⃣ If LLM returns tool calls → execute them, append results, loop
    # 3️⃣ If LLM returns plain text → finish

That’s the “magic”: a while‑loop with function calling.
The remaining 95 % consists of:

context management
tool execution
error handling
permission checks

Simplified Agent Loop

def run_agent_loop(user_input, conversation, config):
    conversation.add_user(user_input)

    for iteration in range(config.max_iterations):
        stream = completion(
            model=routed_model,
            messages=conversation.messages,
            tools=TOOL_DEFINITIONS,
            stream=True,
        )

        text, tool_calls, usage = process_stream(stream)

        if not tool_calls:
            # No tools called — model is done
            conversation.add_assistant(content=text)
            break

        # Execute each tool, feed results back, loop
        for tc in tool_calls:
            result = execute_tool(tc.name, tc.args)
            conversation.add_tool_result(tc.id, result)

When a user says “fix the bug in app.py”, the LLM doesn’t edit the file directly. It:

Calls read_file("app.py") → receives the source.
Calls edit_file(...) with the fix.
Calls run_command("pytest") to verify.

Each step is a tool call that the loop executes and feeds back into the next iteration.

Architecture

┌─────────────────────────────────────────────────┐
│                  cli.py (UI)                    │
│  REPL loop · slash commands · Rich terminal UI  │
└──────────────────────┬──────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────┐
│               agent.py (Brain)                  │
│  Agentic loop · context management · permissions│
│                                                 │
│   LiteLLM ──→ Claude / GPT / Gemini / Ollama    │
└──────────────────────┬──────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────┐
│               tools.py (Hands)                  │
│  read_file · write_file · edit_file             │
│  run_command · git_commit · search_text         │
└─────────────────────────────────────────────────┘

File	Responsibility
cli.py	Terminal UI (REPL, slash commands, session management)
agent.py	Brain (agentic loop, streaming, permissions, context compaction)
tools.py	Hands (file I/O, bash execution, git, search)

Feature I’m Most Proud Of: Cost‑Aware Routing

Most AI coding tools lock you into a single model, making you pay the same price for a simple explanation as for a full‑scale refactor. AgentCode classifies each request by complexity and automatically selects the cheapest model that can handle it.

Routing Table

Tier	Example Prompt	Model	Why
Light	“what does this function do”	Haiku	Fast, cheap — just reading and explaining
Medium	“write unit tests for app.py”	Sonnet	Needs to understand code and generate new code
Heavy	“refactor the entire auth system”	Opus	Multi‑file, multi‑step, architectural thinking

The classification uses simple pattern matching:

def classify_complexity(user_input):
    text = user_input.lower()

    heavy_score = sum(1 for p in HEAVY_PATTERNS if re.search(p, text))
    medium_score = sum(1 for p in MEDIUM_PATTERNS if re.search(p, text))

    if heavy_score >= 2:
        return "heavy"
    elif medium_score >= 1:
        return "medium"
    else:
        return "light"

Transparent, easy to tweak, and it saves real money.
You can always override the automatic choice with the /model command.

Streaming: The UX Difference

The first version waited for the full LLM response before showing anything, leaving the terminal blank for 5–10 seconds. Adding streaming turned the experience into a real‑time conversation.

The Challenge

In an agentic loop the LLM can return both plain text and tool calls in the same response. Text tokens arrive one‑by‑one, while tool‑call arguments arrive as fragments that must be assembled before execution.

Streaming Processor

def process_stream(stream):
    full_text = ""
    tool_calls_acc = {}

    for chunk in stream:
        delta = chunk.choices[0].delta

        # Text tokens — print immediately
        if delta.content:
            print(delta.content, end="", flush=True)
            full_text += delta.content

        # Tool call fragments — accumulate silently
        if delta.tool_calls:
            for tc_delta in delta.tool_calls:
                idx = tc_delta.index
                if idx not in tool_calls_acc:
                    tool_calls_acc[idx] = {"id": "", "name": "", "arguments": ""}
                if tc_delta.function.arguments:
                    tool_calls_acc[idx]["arguments"] += tc_delta.function.arguments

    return full_text, tool_calls_acc

Text streams to the screen in real time.
Tool calls are silently assembled in the background.

The user sees words appear instantly while the agent decides what to do next.

Multi-Model Support

AgentCode uses LiteLLM as an abstraction layer. This means I write one set of tool definitions in OpenAI’s format, and LiteLLM translates them to whatever the provider expects.

Switch models mid‑conversation

❯ /model gpt-4o
✓ Switched to gpt-4o

❯ /model claude-opus-4-6
✓ Switched to claude-opus-4-6

❯ /model ollama/qwen2.5-coder
✓ Switched to ollama/qwen2.5-coder

Same tools, same loop, different brain. The local Ollama option means you can run the entire thing with zero API cost.

The Permission System

Any tool that writes files or executes commands asks before acting:

🔒 Permission Required
Tool: write_file
Args: {"path": "src/handler.py", "content": "..."}
Allow this action? [y/n] (y):

Read‑only tools (read_file, list_directory, search) auto‑approve. This keeps the flow fast while preventing the agent from doing anything destructive without your consent.

What I Learned

Context management is the hard problem.
The agentic loop itself is trivial. Managing what’s in the context window — compacting old messages, summarizing, keeping the right information available — is where the real engineering effort lies.
Tool definitions matter more than the prompt.
A well‑described tool with clear parameter descriptions outperforms a clever system prompt. The LLM reads the tool schema like documentation.
Streaming changes everything.
The difference between “wait 8 seconds for a response” and “see words appearing instantly” is the difference between a frustrating tool and one you enjoy using.
Multi‑model flexibility is underrated.
Different models excel at different tasks. Being able to hot‑swap between them — or let the router decide — means you always have the right tool for the job.

Try It

pip install agentcode-cli
export ANTHROPIC_API_KEY="your-key"
agentcode

The codebase is readable Python — no frameworks, no abstractions. If you’re curious how agentic coding tools work, clone it and read through agent.py. The entire loop is about 50 lines.

GitHub:
PyPI:

MIT licensed. Feedback and contributions welcome.

Tags: python, ai, opensource, tutorial