Building an AI-Powered Code Editor: (part 2) LLM like interpreter

Published: 1 month ago (December 29, 2025 at 05:42 PM EST)

6 min read

Source: Dev.to

The Insight

While building LLM CodeForge, an agentic editor that allows LLMs to read, modify, and test code autonomously, after 5 000 tokens of instructions I realized something:

I wasn’t writing a prompt. I was building a domain‑specific language (DSL) embedded in natural language.

This article analyzes how and why this distinction is fundamental—and what you can learn for your own agentic systems.

How the Model Behaves in CodeForge

Aspect	What the model does	What the model doesn’t
Decision making	Selects which branch of the protocol to follow	Does not decide what to do
Problem solving	Executes a procedure described in natural language	Does not solve problems creatively
Nature	Functions like a bytecode interpreter, a text‑driven finite‑state machine, or a planner with closed actions	Relies on deterministic control flow, not on open‑ended reasoning
Reliability	Works because I accept the LLM is fundamentally unreliable	N/A

DSL Control Flow

Every request follows four steps:

[UNDERSTAND] → [GATHER] → [EXECUTE] → [RESPOND]

Step 1 – UNDERSTAND

Classify request type

Type	Keywords	Next Step
Explanation	“what is”, “explain”	`[RESPOND]` (text)
Modification	“add”, “change”	`[GATHER] → [EXECUTE]`
Analysis	“analyze”, “show”	`[GATHER] → [RESPOND]`

This is not chain‑of‑thought in the classic sense. It is deterministic task routing—a decision table mapping input → workflow. The model doesn’t “think”; it executes a conditional jump.

Invariant Rules

🚨 CRITICAL RULE: You CANNOT use update_file on a file you haven’t read in this conversation.

Self‑check before ANY update_file

Did I receive the content from the system?
Do I know the exact current state?
Am I modifying based on actual code?

If any answer is NO → OUTPUT read_file ACTION, then STOP.

This is an attempt to define pre‑conditions in natural language. It’s akin to:

def update_file(path, content):
    assert path in conversation_state.read_files
    # ... actual update

Without a type system or automatic runtime enforcement, the rule reduces (but does not eliminate) the probability of the LLM modifying a file without having read it first. In tests I observed ~85‑90 % stability, but server‑side validation remains essential for critical cases.

Multi‑File Tasks: Prompt Regeneration

The most effective technique I implemented is dynamically regenerating the prompt to force the LLM to follow a multi‑step plan.

Concrete Scenario

User request: “Add authentication to the project.”

Planning – LLM generates a plan:

{
  "plan": "I will modify these files in order:",
  "files_to_modify": ["Auth.js", "Login.jsx", "App.jsx"]
}

First file – LLM works on Auth.js and completes it.
Trick – Instead of asking the LLM to “remember the plan”, I regenerate the prompt with an explicit “next‑action” block:

### ⚠️ MULTI‑FILE TASK IN PROGRESS

You completed: Auth.js  
Remaining files: Login.jsx, App.jsx

### 🚨 REQUIRED ACTION
Your next output MUST be:
{"action":"continue_multi_file","next_file":{"path":"Login.jsx"}}

Do NOT do anything else. Do NOT deviate from the plan.

The LLM no longer needs to “remember” anything. The only possible action is baked into the prompt.

Why It’s Powerful

The state (which files are done, which remain) lives in external JavaScript, not in the LLM’s memory.
At every step I regenerate the prompt with the updated state.
The LLM always sees a single, unambiguous instruction.

Implementation Sketch

// In the code
function buildPrompt(multiFileState) {
  let prompt = BASE_PROMPT;

  if (multiFileState) {
    prompt += `
### TASK IN PROGRESS
Completed: ${multiFileState.completed.join(', ')}
Next: ${multiFileState.next}
Your ONLY valid action: continue_multi_file with ${multiFileState.next}
`;
  }

  return prompt;
}

This is state injection: external state completely controls what the LLM can do next.

Custom Delimiters for Mixed Content

To handle different “types” (JSON, code, metadata) the DSL uses custom delimiters:

#[json-data]
{"action":"create_file","file":{"path":"App.jsx"}}
#[end-json-data]

#[file-message]
This file implements the main app component.
#[end-file-message]

#[content-file]
export default function App() {
  return Hello World;
}
#[end-content-file]

Why Not Plain JSON or XML?

The content may contain {}, <>, etc., requiring complex escaping.
#[tag]…#[end-tag] is syntactically unique, easy to parse with regex, and independent of the embedded language.
It behaves like a context‑free grammar separating semantic layers.

Error‑Example Guidance

Embedding “error examples” directly in the DSL teaches the model the common failure modes—an inline unit‑test for the language.

❌ Incorrect	✅ Correct
`{"action":"start_multi_file","plan":{},"first_file":{...}}`	`{"action":"start_multi_file","plan":{},"first_file":{...}}`
`#[json-data]{...}#[file-message]...`	`#[json-data]{...}#[end-json-data]#[file-message]...`

Trade‑offs of a Natural‑Language DSL

Limitation	Consequence
❌ No verifiable types	No static type checking
❌ No automatic syntactic validation	Errors must be caught at runtime
❌ No AST for transformations	No compile‑time optimizations

Compensations

✅ Huge validation checklist (8+ points)
✅ Semantic redundancy – same rule expressed in multiple ways
✅ Extensive anti‑pattern documentation

When building a DSL in natural language, these trade‑offs are inevitable, but the resulting system can still be robust, transparent, and controllable.

Overview

The parser is a probabilistic LLM rather than a deterministic compiler.
If I were to evolve CodeForge in the future, a true mini‑DSL (JSON Schema + codegen) would reduce the prompt size by 30‑40 %. In the browser sandbox, however, the current choice is justified.

Pre‑Send Checklist

Before EVERY response, verify:

#	Check	Fix If Failed
1	JSON valid	Correct structure
2	Tags complete	Add missing `#[end-*]` tags

Alone, this yields 40‑60 % reliability. In my system it reaches 80‑90 % because it acts as a stability multiplier when:

The model is already channeled (decision protocol)
The format is rigid (custom delimiters)
The next action is deterministic (state injection)

Meta‑validation is not the main feature – it is the final safety net in an already constrained system.

Model Compatibility

✅ Works well with Claude 3.5, GPT‑4  
❌ Smaller models will fail  
❌ Less‑aligned models will ignore sections

I am implicitly saying: this system requires “serious” models.
It is an architectural constraint I accepted — like saying “this library requires Python 3.10+.”

Contextual Re‑Anchoring

Take the “read‑before‑write” rule:

Appears in the Decision Protocol (when planning)
Appears in Available Actions (when executing)
Appears in Pre‑Send Validation (when verifying)
Appears in Golden Rules (as a general principle)

This is strategic repetition, not random redundancy. It mirrors safety‑critical systems:

Same invariant
Verified at multiple levels
With specific phrasing for each context

Pattern Examples

Bad vs. Good Patterns

// BAD: Relying on the model's "memory"
"Remember that you have already read these files..."

// GOOD: Injecting explicit state
prompt += `Files already read: ${readFiles.join(', ')}`

// BAD: Giving open choices
"Decide which operation to perform"

// GOOD: Forcing the only legal move
"Your NEXT action MUST be: continue_multi_file"

Input → Action → Next State

Input Pattern	Action	Next State
`"add X"`	GATHER	EXECUTE
`"explain Y"`	RESPOND	END

Instead of “think what to do”, use “if X then Y”.

Embedding Arbitrary Content

Don’t use JSON – escaping nightmare
Don’t use XML – conflicts with HTML/JSX
Use unique tags: #[content]…#[end-content]

Pattern #5 – Redundancy = Coverage, Not Noise

Repeat critical rules
- In different formulations (semantic reinforcement)
- In different contexts (contextual re‑anchoring)
- With different rationales (why, not just what)

After 5 000 tokens and months of iterations, the most important lesson is:

This prompt is not “beautiful.” It is effective.

Optimization Shift

❌ Stopped looking for	✅ Started optimizing for
The shortest possible prompt	Robustness in edge cases
The most elegant formulation	Failure‑mode coverage
The most general abstraction	Debugging clarity when it fails

Result:

Redundant? Yes.
Verbose? Absolutely.
Works? Consistently.

Future Directions: Where I Could Go

If I were to evolve CodeForge 2.0, I would explore a two‑agent architecture instead of a single 5 000‑token agent:

Agent	Token Budget	Role
Planner Agent	2 000	Decides strategy
Executor Agent	2 000	Implements actions

Benefits

Separation of concerns
Less context per agent
Parallel execution possible

Key Techniques that Work

Rating	Technique
⭐⭐⭐⭐⭐	State injection + forced next action
⭐⭐⭐⭐⭐	Decision tables for task routing
⭐⭐⭐⭐⭐	Custom delimiters for structured output
⭐⭐⭐⭐⭐	Contextual re‑anchoring of invariants
⭐⭐⭐⭐	Meta‑validation as a safety net
⭐⭐⭐	Visual hierarchy (useful but not critical)

Fundamental principle:
Don’t ask the LLM to “understand” — force it to “execute”.

Treat the protocol as a DSL, not a conversation. External state must constrain possible actions, and validation must happen server‑ or client‑side, always, without exceptions. Redundancy can be a feature, not a bug.

Call to Action

Try CodeForge:

The project is open source – the full prompt and validation system implementation are available in the repository.

Questions for the Community

Have you ever built embedded DSLs in natural language?
What is the cost of “cognitive overhead” in your prompts?
Two‑agent architecture vs. single‑agent: experiences?

Share in the comments — this is still largely unexplored territory.

Building an AI-Powered Code Editor: (part 2) LLM like interpreter

The Insight

How the Model Behaves in CodeForge

DSL Control Flow

Step 1 – UNDERSTAND

Invariant Rules

Multi‑File Tasks: Prompt Regeneration

Concrete Scenario

Why It’s Powerful

Implementation Sketch

Custom Delimiters for Mixed Content

Why Not Plain JSON or XML?

Error‑Example Guidance

Trade‑offs of a Natural‑Language DSL

Overview

Pre‑Send Checklist

Model Compatibility

Contextual Re‑Anchoring

Pattern Examples

Bad vs. Good Patterns

Input → Action → Next State

Embedding Arbitrary Content

Pattern #5 – Redundancy = Coverage, Not Noise

Optimization Shift

Future Directions: Where I Could Go

Key Techniques that Work

Call to Action

Questions for the Community

Related posts

The $0 Localization Stack for Solo .NET Developers

Networking for DevOps (Senior-Level, Production-Focused)

# The Engineering Behind Zero-Buffer 4K Streaming: A Deep Dive into High-Performance Smart4k IPTV Architecture

Bga Buses (MUX Challenge)

The Insight

How the Model Behaves in CodeForge

DSL Control Flow

Step 1 – UNDERSTAND

Invariant Rules

Multi‑File Tasks: Prompt Regeneration

Concrete Scenario

Why It’s Powerful

Implementation Sketch

Custom Delimiters for Mixed Content

Why Not Plain JSON or XML?

Error‑Example Guidance

Trade‑offs of a Natural‑Language DSL

Overview

Pre‑Send Checklist

Model Compatibility

Contextual Re‑Anchoring

Pattern Examples

Bad vs. Good Patterns

Input → Action → Next State

Embedding Arbitrary Content

Pattern #5 – Redundancy = Coverage, Not Noise

Optimization Shift

Future Directions: Where I Could Go

Key Techniques that Work

Call to Action

Questions for the Community

Related posts

The $0 Localization Stack for Solo .NET Developers

Networking for DevOps (Senior-Level, Production-Focused)

# The Engineering Behind Zero-Buffer 4K Streaming: A Deep Dive into High-Performance Smart4k IPTV Architecture

Bga Buses (MUX Challenge)

Step 1 – UNDERSTAND