Are compilers deterministic?

Published: 3 days ago (February 21, 2026 at 07:21 PM EST)

5 min read

Source: Hacker News

22 Feb 2026

Betteridge says “no,” and for normal developer experience that answer is mostly right. (Also, you’re absolutely right! — I used ChatGPT to help me write this.)

My take

There are two complementary answers:

Computer‑science answer – a compiler is deterministic as a function of its full input state.
Engineering answer – most real builds do not control the full input state, so the outputs drift over time.

Formal model

artifact = F(
    source,
    flags,
    compiler binary,
    linker + assembler,
    libc + runtime,
    env vars,
    filesystem view,
    locale + timezone,
    clock,
    kernel behavior,
    hardware / concurrency schedule
)

In practice, most teams keep only source (and maybe flags) constant and label everything else “noise.”
That “noise” is exactly where non‑reproducibility lives.

Lessons from Ksplice

I worked at Ksplice in the 2000s, where we patched running Linux kernels in RAM so security updates could be applied without a reboot.
Reading objdump output of crash‑prone kernels wasn’t a daily routine, but it happened often enough that the gap between compiler output and source intent stopped being theoretical.

We generated reboot‑less kernel updates by diffing the old vs. new compiled objects and stitching hot patches into live memory.
Most diffs mapped cleanly to changed C code.
Occasionally they “exploded” for reasons unrelated to source semantics: different register allocation, altered pass behavior, or section/layout changes.
Same intent, different machine code.

Concrete historical artifact

GCC bug 18574 discusses pointer‑hash instability that affected traversal order and SSA coalescing. The thread illustrates how seemingly innocuous changes in the toolchain can break reproducibility.

Key distinctions

Concept	Definition
Deterministic compiler	Given the exact same complete input tuple, it always produces the same output.
Reproducible build	Two independent builders can recreate a bit‑identical artifact.
Reliable toolchain	Differences in output are rare and, when they occur, rarely affect functional correctness.

These concepts are related but not equivalent; understanding the difference helps set realistic expectations for build stability.

Compiler Contract: Semantics, Not Byte Identity

The commenter is right on this point: compilers are expected to preserve semantics. For programs with defined behavior, the output should be observationally equivalent to the source language’s abstract machine.

That means that instruction order, register choice, inlining strategy, and block layout are all fair game—as long as the externally visible behavior stays the same. In practice, “visible behavior” includes things like:

I/O effects
Volatile accesses
Atomic‑synchronization guarantees
Defined return values

…but not byte‑for‑byte instruction identity.

Important caveats

Undefined behavior weakens or voids the semantic guarantee.
Timing, micro‑architectural side channels, and exact memory layout are usually outside the core language contract.
Reproducible builds are a stricter goal than semantic preservation (they require the same bits, not just the same behavior).

Where Entropy Comes From

__DATE__, __TIME__, __TIMESTAMP__
Embedded absolute paths in DWARF/debug info
Build‑path leakage (e.g., /home/fragmede/projects/foo)
Locale‑sensitive sort behavior (LC_ALL)
Filesystem iteration order
Parallel build and link race ordering
Archive member order and metadata (ar, ranlib)
Build IDs, UUIDs, random seeds
Network fetches during the build
Toolchain version skew
Host kernel / C library differences
Historical compiler internals that depend on unstable pointer or hash‑traversal order

ASLR note: ASLR does not directly randomize the emitted binary; it randomizes the process memory layout. However, if a compiler pass’s behavior depends on pointer identity or order, ASLR can indirectly perturb outcomes.

So “compilers are deterministic” is often true in a theorem‑sense but false in an operational sense.

Even with reproducible artifacts, Ken Thompson’s Reflections on Trusting Trust still applies.

Remember, compilers are not new technology: Grace Hopper’s A‑0 system dates back to 1952 on the UNIVAC I. (ChatGPT has only been around 4 years, while compilers have been around 74 years.)

Reproducible Builds: Deliberate Engineering

Debian and the broader reproducible‑builds effort (around 2013 onward) pushed this mainstream idea: the same source + the same build instructions should produce bit‑for‑bit identical artifacts.

Practical Playbook

Freeze toolchains and dependencies
Use a stable environment – e.g. TZ=UTC, LC_ALL=C
Set SOURCE_DATE_EPOCH to a fixed timestamp
Normalize / strip volatile metadata (timestamps, IDs, etc.)

Canonicalize path prefixes

-ffile-prefix-map=
=
-fdebug-prefix-map=
=

Create deterministic archives – e.g. ar -D
Remove network access from the build graph
Build in hermetic containers or sandboxes
Continuously diff artifacts across builders in CI

Outcomes

Repeatable – you can run the same command and get the same result.
Reproducible – different machines produce identical binaries.
Verifiable – anyone can check that the output matches the source.
Hermetic – builds are isolated from external state.
Deterministic – no hidden randomness influences the result.

Current State

Do we have this now?
In many ecosystems the answer is mostly yes, but it required years of intentional work across compilers, linkers, packaging tools, and build systems. We arrived here by grinding through obscure edge cases—not by simply waving our hands and declaring purity.

Why This Matters for LLMs

The question often surfaces as: “Is vibecoding sane if LLMs are nondeterministic?”
Before answering, decide whether you want the computer‑science perspective or the engineering perspective.

The Halting Problem Analogy

We have not solved the halting problem in the formal, theoretical sense.
Practically, however, an LLM can spot a broken for‑loop, explain why the condition is wrong, and even suggest a fix.

Engineering Reality

Engineering never relies on perfectly deterministic intelligence.
Instead, it depends on:

Controlled interfaces
Test oracles
Reproducible pipelines
Observability

I’m “AI‑pilled” enough to drive a comma.ai car daily, yet I still demand deterministic verification gates around any generated code.
My girlfriend prefers the car’s smoother, less erratic behavior—reminding us that a probabilistic system can still deliver operationally better results.

Are compilers deterministic?

22 Feb 2026

My take

Formal model

Lessons from Ksplice

Concrete historical artifact

Key distinctions

Compiler Contract: Semantics, Not Byte Identity

Important caveats

Where Entropy Comes From

Reproducible Builds: Deliberate Engineering

Practical Playbook

Outcomes

Current State

Why This Matters for LLMs

The Halting Problem Analogy

Engineering Reality

Pattern for L

Related posts

GPU-accelerated desktop compositor for Linux - No it's not Wayland or X11

Installing Gentoo Linux: What to Expect Before You Start

Turing Completeness of GNU Find: From Mkdir-Assisted Loops to Standalone Comput

Building an MCP Server for Linux Desktop GUI Automation on Wayland

22 Feb 2026

My take

Formal model

Lessons from Ksplice

Concrete historical artifact

Key distinctions

Compiler Contract: Semantics, Not Byte Identity

Important caveats

Where Entropy Comes From

Reproducible Builds: Deliberate Engineering

Practical Playbook

Outcomes

Current State

Why This Matters for LLMs

The Halting Problem Analogy

Engineering Reality

Pattern for L

Related posts

GPU-accelerated desktop compositor for Linux - No it's not Wayland or X11

Installing Gentoo Linux: What to Expect Before You Start

Turing Completeness of GNU Find: From Mkdir-Assisted Loops to Standalone Comput

Building an MCP Server for Linux Desktop GUI Automation on Wayland

22 Feb 2026