Thoughts on Generating C

Published: (February 9, 2026 at 08:54 AM EST)
8 min read

Source: Hacker News

Static Inline Functions Enable Data Abstraction

When I learned C, in the early days of GStreamer (oh bless its heart—it still has the same web page!), we used lots of pre‑processor macros.
Over time we got the message that many macro uses should have been inline functions; macros are for token‑pasting and generating names, not for data access or other implementation.

What I did not appreciate until much later was that always‑inline functions remove any possible performance penalty for data abstractions.

Example

In Wastrel I describe a bounded range of WebAssembly memory via a memory struct and an access to that memory in another struct:

struct memory {
    uintptr_t base;
    uint64_t   size;
};

struct access {
    uint32_t addr;
    uint32_t len;
};

If I want a writable pointer to that memory I can write:

#define static_inline \
    static inline __attribute__((always_inline))

static_inline void *write_ptr(struct memory m, struct access a) {
    BOUNDS_CHECK(m, a);
    char *base = __builtin_assume_aligned((char *)m.base, 4096);
    return (void *)(base + a.addr);
}
  • BOUNDS_CHECK is usually omitted; the memory is mapped into a PROT_NONE region of an appropriate size.
  • We use a macro for BOUNDS_CHECK so that, if the check fails and kills the process, we can embed __FILE__ and __LINE__.

Regardless of whether explicit bounds checks are enabled, the static_inline attribute ensures that the abstraction cost is entirely eliminated. When bounds checks are elided, the size of the memory and the len of the access are never allocated.

If write_ptr weren’t static_inline, I would worry that some of these struct values might be passed through memory. This is mostly a concern with functions that return structs by value; e.g., on AArch64 returning a struct memory would use the same registers that a call to void (*)(struct memory) would use for the argument, whereas the SysV x86‑64 ABI only allocates two general‑purpose registers for return values. I prefer not to think about this bottleneck, and static inline functions take care of it for me.

Avoid Implicit Integer Conversions

C has an odd set of default integer conversions (e.g., promoting uint8_t to int) and weird boundary conditions for signed integers. When generating C, it’s safer to sidestep these rules and be explicit:

  • Define static‑inline cast helpers such as u8_to_u32, s16_to_s32, etc.
  • Turn on -Wconversion to catch accidental implicit conversions.

Using static‑inline cast functions also lets the generated code assert that operands are of a particular type. Ideally, all casts live in your helper functions, and no cast appears in the generated code itself.

Wrap Raw Pointers and Integers with Intent

Whippet is a garbage collector written in C. A GC cuts across all data abstractions: objects may be viewed as absolute addresses, ranges in a paged space, offsets from the beginning of an aligned region, and so on. Representing all of these concepts with plain size_t or uintptr_t quickly becomes a nightmare.

Whippet therefore introduces single‑member structs to give each concept its own type:

/* api/gc-ref.h */
struct gc_ref {
    uintptr_t addr;
};

/* api/gc-edge.h */
struct gc_edge {
    uintptr_t edge_addr;
};

These structs prevent accidental misuse—gc_edge_address will never be called with a struct gc_ref, and vice‑versa.

Why This Helps Compilers

When a compiler knows the exact type of a term, it can avoid many mistakes in the residual C code.

Consider compiling WebAssembly’s struct.set operation. The textual semantics say:

“Assert: Due to validation, val is some ref.struct structaddr.”

We can translate that assertion into C by building a forest of pointer subtypes:

typedef struct anyref   { uintptr_t value; } anyref;
typedef struct eqref    { anyref p; } eqref;
typedef struct i31ref   { eqref p; } i31ref;
typedef struct arrayref { eqref p; } arrayref;
typedef struct structref{ eqref p; } structref;

For a concrete type, e.g. (type $type_0 (struct (mut f64))), we generate:

typedef struct type_0ref { structref p; } type_0ref;

A field‑setter for $type_0 then takes a type_0ref:

static inline void
type_0_set_field_0(type_0ref obj, double val) {
    /* ... */
}

Thus the source‑level type information propagates all the way to the target language.

A similar type forest exists for the actual object representations:

typedef struct wasm_any   { uintptr_t type_tag; } wasm_any;
typedef struct wasm_struct{ wasm_any p; } wasm_struct;
typedef struct type_0     { wasm_struct p; double field_0; } type_0;

We generate tiny cast routines to go back and forth between type_0ref and type_0 * as needed. Because all routines are static inline, there is no runtime overhead, and we obtain pointer subtyping for free.

These patterns have helped me get reliable, high‑performance C out of my compilers. They’re not “best practices” in any official sense, but they work for me, and you’re welcome to adopt them too.

struct.set $type_0 0  
The instruction is passed a subtype of `$type_0`; the compiler can generate an up‑cast that type‑checks.

Fear Not memcpy

In WebAssembly, accesses to linear memory are not necessarily aligned, so we can’t just cast an address to (say) int32_t* and dereference.
Instead we do:

memcpy(&i32, addr, sizeof(int32_t));

and trust the compiler to emit an unaligned load if it can (and it can). No need for more words here!

For ABI and Tail Calls, Perform Manual Register Allocation

GCC finally has __attribute__((musttail)): praise be. However, when compiling to WebAssembly you might end up with a function that has 30 arguments or 30 return values. I don’t trust a C compiler to reliably shuffle between different stack‑argument needs at tail calls to or from such a function. It could even refuse to compile a file if it can’t meet its musttail obligations—hardly a desirable characteristic for a target language.

What you really want is for all function parameters to be allocated to registers. You can ensure this by, for example, passing only the first n values in registers and passing the rest in global variables. You don’t need to place them on a stack, because the callee can load them back into locals as part of the prologue.

What’s fun about this is that it also neatly enables multiple return values when compiling to C:

  1. Enumerate the set of function types used in your program.
  2. Allocate enough global variables of the appropriate types to store all return values.
  3. In the function epilogue, store any “excess” return values (those beyond the first) in those globals.
  4. Callers reload those values immediately after the call.

What’s Not to Like

Generating C is a local optimum:

  • You get the industrial‑strength instruction selection and register allocation of GCC or Clang.
  • You don’t have to implement many peephole‑style optimizations.
  • You can link to possibly‑inlinable C runtime routines.

It’s hard to improve over this design point in a marginal way.

There are drawbacks, of course. As a Schemer, my biggest annoyance is that I don’t control the stack:

  • I don’t know how much stack a given function will need.
  • I can’t extend the stack of my program in any reasonable way.
  • I can’t iterate the stack to precisely enumerate embedded pointers (perhaps that’s fine).
  • I certainly can’t slice a stack to capture a delimited continuation.

The other major irritation concerns side tables: one would like to implement so‑called zero‑cost exceptions, but without support from the compiler and toolchain, that’s impossible.

Finally, source‑level debugging is gnarly. You’d like to embed DWARF information corresponding to the code you residualize, but I don’t know how to do that when generating C.

Why not Rust, you ask?
For what it’s worth, I’ve found that lifetimes are a front‑end issue; if I had a source language with explicit lifetimes, I would consider producing Rust, as I could machine‑check that the output has the same guarantees as the input. Likewise, if I were using a Rust standard library. But if you are compiling from a language without fancy lifetimes, I don’t know what you would gain from Rust: fewer implicit conversions, yes, but less mature tail‑call support, longer compile times… it’s a wash, I think.

Oh well. Nothing is perfect, and it’s best to go into things with your eyes wide open. If you’ve made it this far, I hope these notes help you in your generations. For me, once my generated C type‑checked, it worked: very little debugging was necessary. Hacking isn’t always like this, but I’ll take it when it comes.

Until next time, happy hacking!

0 views
Back to Blog

Related posts

Read more »