Teaching an LLM to Write Assembly: GBNF-Constrained Generation for a Custom 8-Bit CPU

Published: (December 4, 2025 at 03:03 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Introduction

Over the past few weeks I’ve been building a fully‑playable 8‑bit virtual console from scratch — CPU, instruction set, assembler, sprite system, IDE, the lot. One of the more interesting side quests has been teaching an LLM to generate valid assembly targeting my CPU. You can follow the full build in my YouTube series Building a Virtual 8‑Bit Console with an AI Assistant (up to part 14). The source code for the project is on GitHub – see the snapshot at the tag where I added this work.

When you ask a model to generate domain‑specific code it often “looks right” but the parser rejects it. I needed the model to emit only tokens that my assembler would accept, which led me to GBNF — a compact grammar notation supported by llama.cpp and other inference runtimes that forces the model to emit syntactically valid output.

What GBNF Actually Is

GBNF (Grammar‑Based Neural‑Fusion) is a small grammar format that describes which token sequences are valid in your language. Several inference runtimes (including llama.cpp and vLLM) can use it during decoding to mask out any tokens that would violate the grammar. The workflow is:

  1. Describe your language using GBNF rules, terminals, and non‑terminals.
  2. The runtime loads this grammar.
  3. During generation it masks any token that would break the grammar, forcing the LLM to walk only along valid paths.

This does not make the model understand the DSL; it merely prevents syntactic errors. For example, if the grammar lists opcodes as LOAD | STORE | ADD | SUB, the model cannot invent LOADX or MOVE.

Designing a Grammar for an Assembly‑like DSL

I started with the comprehensive specifications for my assembler and CPU, plus an AI cheatsheet I’d been feeding to Claude. Using those materials, I asked Claude to draft a GBNF grammar, then reviewed it line‑by‑line against my example programs. The resulting grammar mirrors exactly what the assembler expects.

Simplified Grammar Excerpt

root ::= line*
line ::= ws* statement? comment? eol

statement ::= instruction | directive
instruction ::= opcode-noarg
              | opcode-single ws+ operand
              | opcode-double ws+ operand ws* "," ws* operand

opcode-noarg ::= "NOP"i | "RET"i | "RTI"i | "SEI"i | "CLI"i
opcode-single ::= "PUSH"i | "POP"i | "INC"i | "DEC"i | "JMP"i | "CALL"i
opcode-double ::= "LD"i | "ST"i | "MOV"i | "ADD"i | "SUB"i | "AND"i | "OR"i | "XOR"i | "CMP"i

operand ::= immediate | register | memory-ref | identifier
immediate ::= "#" (number | identifier)
register ::= "R"i [0-5] | "SP"i | "PC"i
memory-ref ::= "[" ws* (number | identifier) ws* "]"

number ::= "$" [0-9a-fA-F]+ | [0-9]+
identifier ::= [a-zA-Z_] [a-zA-Z0-9_]*
comment ::= ws* ";" [^\r\n]*
ws ::= [ \t]+
eol ::= "\r"? "\n" | "\r"

The i suffix makes matching case‑insensitive, so LD, ld, and Ld are all valid. The grammar’s strictness eliminates the common “hallucinated opcodes”, missing commas, stray punctuation, and non‑existent registers that plagued earlier attempts.

The full grammar file can be found here.

Plugging the Grammar into llama.cpp

llama.cpp exposes a /completion endpoint that accepts a grammar parameter containing your GBNF as a string. Below is the TypeScript helper I use in the IDE to request grammar‑constrained generation.

async function generateWithGrammar(prompt: string, grammar: string): Promise {
  const response = await fetch(`${this.codegenUrl}/completion`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      prompt,
      n_predict: 4096,
      grammar,
      temperature: 0.7,
      stop: ['\n\n\n'],
    }),
  });

  if (!response.ok) {
    const errorText = await response.text();
    throw new Error(`llama.cpp grammar generation failed: ${response.status} ${errorText}`);
  }

  const result = await response.json();
  return result.content;
}

Key parameters

ParameterPurpose
promptFull prompt including context and instructions
grammarGBNF grammar string (loaded from the file)
n_predictMaximum tokens to generate (4096 gives ample room)
temperature0.7 balances creativity with coherence
stopTriple newlines (\n\n\n) signal the end of generation

Results and Observations

The first time the model produced perfectly valid assembly on the first attempt, the impact was immediate: no post‑processing, no syntax‑error handling, and the assembler accepted the output straight away. Compared to earlier experiments:

Model / SetupTypical Issues
Qwen (no grammar)Hallucinated opcodes, malformed instructions, missing commas
Claude Sonnet 4.5 (no grammar)Mostly correct but occasional stray punctuation
Claude + GBNFZero syntactic errors; every line conforms to the assembler’s expectations

The approach also scales: you can drop to smaller, cheaper local models while retaining reliable, structured output. The only remaining challenge is semantic correctness (does the generated program do what you intend?), which I’ll explore in a future post.

Conclusion

Grammar‑constrained generation with GBNF turns the problem of “cleaning up a mess after the fact” into “making the mess impossible”. By describing the exact token sequences your DSL accepts, you eliminate an entire class of syntax errors, allowing even modest local LLMs to produce usable code for highly brittle languages such as assembly. The technique is generic and can be applied to any domain that requires guaranteed‑valid output: configuration files, structured data, test scripts, game‑engine scripts, and more.

Back to Blog

Related posts

Read more »