Brewing Cappuccino: Writing a compiler without LLVM's IR

Published: (December 21, 2025 at 12:25 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

Compilers have always felt like magic to me. They seem both complex and simple—just programs that convert code from one form to another. I’ve spent countless hours probing the assembly generated by clang, watching how C gets turned into machine code. Naturally, I got curious and wanted to know how to write a compiler.

So how did I do it?

The first thing I did was Google “how to make my own programming language,” which led me to this article. It gave a rough idea of the compilation pipeline, but the author built a transpiler that converted his code to C++. I wasn’t satisfied with that approach.

After digging through more websites, articles, and GitHub repos (this was before ChatGPT became popular), I distilled the essential components of any compiler:

  • Tokenizer – reads the source file and produces tokens, the smallest meaningful lexemes.
  • Abstract Syntax Tree (AST) – a tree‑like representation of the program’s structure.
  • Parser – converts the token stream into the AST.

The usual shortcut is to transpile to LLVM’s IR and let LLVM generate the binary. I didn’t like that because it abstracts away the most fun part: generating assembly yourself. While LLVM’s IR is powerful, it ties the language to a specific platform and feels like cheating for a “no external dependencies” project.

Since I was already comfortable with assembly, I decided to write the backend on my own.

The part where it got messy

I quickly discovered that the tokenizer, parser, and AST all need a solid, consistent structure; otherwise the whole system collapses.

My early AST design was loosely based on this article. It worked for tiny examples, but as soon as I tried generating assembly, the code became painful to maintain. The language I chose to implement was C. I love C and have written projects in it (e.g., anishell), but handling dynamic arrays and garbage collection in C proved cumbersome. After repeatedly rewriting the AST and creating a struct for every node type, I finally switched to C++—a breath of fresh air.

With C++ in place, I researched proper parsing techniques and settled on a recursive‑descent parser, which gave me the flexibility I needed beyond simple calculator examples.

Actually Generating the Assembly

Conceptually, generating assembly is straightforward: use a stack‑machine model—pop operands from the stack, compute the result, and push it back. The real challenge lies in the tiny bugs that can creep in. A single mismatch in variable ordering can cause the whole program to fail spectacularly.

I hit a wall when my C implementation kept failing, so I paused the project. Later, after switching to C++ and leveraging AI assistance (Gemini) for debugging, I managed to iron out the issues. I added a small standard library and finally gave the project its name: Cappuccino.

What did I learn?

  • I still have a lot to learn, and I should research solutions before spending years wrestling with a bad design.
  • Naming the project Cappuccino was inspired by the coffee‑themed naming of Java and my personal love of coffee.

The complete source code is available on GitHub: https://github.com/AnirudhMathur12/cappuccino.

Thank you for reading my first blog post. If you found this interesting, please consider starring the repository.

Back to Blog

Related posts

Read more »

Un-Redactor

Article URL: https://github.com/kvthweatt/unredactor Comments URL: https://news.ycombinator.com/item?id=46368471 Points: 5 Comments: 1...

Fabrice Bellard Releases MicroQuickJS

Article URL: https://github.com/bellard/mquickjs/blob/main/README.md Comments URL: https://news.ycombinator.com/item?id=46367224 Points: 131 Comments: 10...