Coding Agents for Software Engineers
Source: Dev.to
1️⃣ What Is a Coding Agent?
A coding agent is not just an LLM. It is a system:
IDE / CLI
↓
Agent Runtime
↓
Context Builder
↓
LLM Inference
↓
Tool Execution (fs, git, tests, shell)
↓
Loop
- The model is only the reasoning engine.
- The runtime handles orchestration.
2️⃣ General Architecture of a Coding Agent
A production‑grade coding agent includes:
-
Indexing Layer
- Repo scanning
- Symbol extraction
- Dependency graph
- Optional embeddings
-
Context Builder
- Select relevant files
- Inject instructions
- Add plan / scratchpad
- Add recent edits
-
LLM Inference Layer
- Tokenized prompt
- Context‑window constraints
- Streaming output
-
Tool Layer
- File read/write
- Test execution
- Git diff/patch
- Lint / build commands
-
Loop Controller
- Plan
- Execute
- Validate
- Iterate
The model does not “see the repo.” The agent chooses what to send.
3️⃣ What Is the Context Window?
The context window is the maximum number of tokens the model can attend to in a single inference call. It includes:
System instructions
+ AGENTS.md / policies
+ Scratchpad / plan files
+ Relevant source files
+ Recent conversation
+ Tool outputs
+ Your current request
+ Model output
Everything must fit inside the window. A larger window does not mean you should send everything.
4️⃣ Where Does Tokenization Happen?
Typically:
- The agent runtime tokenizes locally (client‑side).
- It estimates token usage before calling the model.
- The server still processes tokens during inference.
Why client‑side tokenization matters
- Avoid exceeding context limits
- Control cost
- Control chunking
- Optimize file selection
5️⃣ What Actually Consumes Tokens?
In coding workflows, token cost usually comes from:
- Large source files
- Test files
- Logs
- Replayed conversation history
- Repeated system instructions
- Scratchpad growth
Your instruction verbosity is rarely the main cost—file selection is.
6️⃣ What Makes “Good Quality” Context?
Good context is:
- ✅ Relevant – only include files that matter.
- ✅ Structured – clear task → constraints → deliverable.
- ✅ Deterministic – explicit scope boundaries.
- ✅ Minimal but sufficient – no narrative fluff, no repeated architecture explanation.
Bad context includes:
- Entire repo dump
- Long emotional explanations
- Old irrelevant chat history
- Ambiguous instructions
7️⃣ What Actually Improves Coding Responses?
1️⃣ Clear Scope
Bad: “Improve authentication system.”
Good:
Scope:
- src/auth/*
- src/middleware/auth.ts
Do not touch:
- public API
- schema definitions
2️⃣ Explicit Constraints
Examples:
- Do not change public interfaces.
- Preserve test behavior.
- No new dependencies.
- Keep diff minimal.
Constraints reduce hallucinated refactors.
3️⃣ Defined Output Format
Deliverable:
- Unified diff only
- Brief explanation (The model does **not** remember it; the agent injects it into context each time.)
9️⃣ Efficient Project Structure for Coding Agents
Recommended layout:
/AGENTS.md # Global behavior rules (minimal)
/PLAN.md # Task plan (editable)
/src/...
/tests/...
AGENTS.md should contain
- Coding standards
- Test commands
- “Plan first” rule
- Guardrails
Keep it short; it is injected often.
🔟 Efficient Coding Agent Usage Patterns
Pattern A — Constrained Patch
Task:
Optimize middleware performance.
Scope:
src/auth/middleware.ts
Constraints:
- Preserve API
- No new deps
Output:
Unified diff only.
Pattern B — Incremental Execution
Implement only Step 1 from PLAN.md.
Run tests.
Update PLAN.md.
Stop.
Pattern C — Scope Locking
Explicitly limit directories:
Touch only:
src/auth/*
Do not modify:
src/db/*
This prevents token waste and unintended edits.
1️⃣1️⃣ What NOT to Do
- ❌ Send the whole repo
- ❌ Re‑explain system architecture every turn
- ❌ Let scratchpads grow unbounded
- ❌ Leave scope ambiguous
- ❌ Ask for “improve everything”
1️⃣2️⃣ Big Context Myth
A 1 M‑token context window does not mean you should send 1 M tokens, nor that it will be faster or more accurate.
Longer context:
- Increases latency
- Increases cost
- Increases noise risk
Smart context selection beats raw size.
1️⃣3️⃣ Mental Model for Engineers
Treat coding agents like this:
LLM = Stateless reasoning engine
Context = Input data packet
Agent = Orchestrator
Scratchpad = External memory
Your job: optimize the data packet.
1️⃣4️⃣ Core Optimization Principles
- Structure > verbosity
- Relevance over sheer volume
completeness
- Constraints > freedom
- Iteration > giant prompts
- Plan → execute → verify
Final Takeaway
Coding agents perform best when:
- The task is clearly scoped
- Constraints are explicit
- Context is curated
- Plans are externalized
- History is pruned
- Output format is constrained