Huge Binaries

Published: 1 month ago (December 29, 2025 at 12:35 AM EST)

6 min read

Source: Hacker News

A problem I experienced while pursuing my PhD and submitting academic articles was that I had built solutions to problems that required dramatic scale to be effective and worthwhile.
Responses to my publication submissions often claimed such problems did not exist; however, I had observed them during my time in industry (e.g., at Google), but I couldn’t cite them!

One problem that is only present in these mega‑codebases is massive binaries.
What’s the largest binary (ELF file) you’ve ever seen? I have observed binaries beyond 25 GiB, including debug symbols. How is this possible?

These companies prefer to statically build their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.

Similar to the sound barrier, there is a point at which code size becomes problematic and we must rethink how we link and build code. For x86_64, that point is the 2 GiB “Relocation Barrier.”

Why 2 GiB? 🤔

Let’s take a look at how position‑independent code is put together.

A simple example

/* simple-relocation.c */
extern void far_function();

int main(void) {
    far_function();
    return 0;
}

Compile the file:

gcc -c simple-relocation.c -o simple-relocation.o

Inspect the object file with objdump:

objdump -dr simple-relocation.o

0000000000000000 :
   0: 55                    push   %rbp
   1: 48 89 e5              mov    %rsp,%rbp
   4: b8 00 00 00 00        mov    $0x0,%eax
   9: e8 00 00 00 00        call   e 
        a: R_X86_64_PLT32    far_function-0x4
   e: b8 00 00 00 00        mov    $0x0,%eax
  13: 5d                    pop    %rbp
  14: c3                    ret

The e8 byte is the CALL opcode (it takes a 32‑bit signed relative offset).
Right now the offset is 0 (four bytes of 0). objdump also tells us that a relocation is required to fix up this code when we finalize it.

Note
The -0x4 is needed because the offset is relative to the instruction pointer after it has advanced past the 4‑byte operand.

We can view the relocation entry with readelf:

readelf -r simple-relocation.o -d

Relocation section '.rela.text' at offset 0x170 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000400000004 R_X86_64_PLT32    0000000000000000 far_function - 4

The entry tells the linker that the 4‑byte operand at offset 0x0a (the start of the CALL’s immediate) must be patched with the address of far_function.

Adding the callee

/* far-function.c */
void far_function(void) {
}

Compile it:

gcc -c far-function.c -o far-function.o

Link the two object files:

gcc simple-relocation.o far-function.o -o simple-relocation

Inspect the final executable:

objdump -dr simple-relocation

0000000000401106 :
 401106: 55                    push   %rbp
 401107: 48 89 e5              mov    %rsp,%rbp
 40110a: b8 00 00 00 00        mov    $0x0,%eax
 40110f: e8 07 00 00 00        call   40111b 
 401114: b8 00 00 00 00        mov    $0x0,%eax
 401119: 5d                    pop    %rbp
 40111a: c3                    ret

000000000040111b :
 40111b: 55                    push   %rbp
 40111c: 48 89 e5              mov    %rsp,%rbp
 40111f: 90                    nop
 401120: 5d                    pop    %rbp
 401121: c3                    ret

The linker has calculated the relative offset (0x07) and patched the CALL instruction correctly.

The 2 GiB Barrier

The CALL opcode (e8) uses a 32‑bit signed displacement, which limits the jump range to ±2 GiB (2³¹ bytes).
Thus a callsite can only reach code that lies within a 2 GiB window forward or backward. This limit is known as the 2 GiB Relocation Barrier.

What happens when the target is farther than 2 GiB?

We can force the linker to place far_function far away using a linker script.

/* overflow.lds */
SECTIONS
{
    /* 1. Standard low‑address sections */
    . = 0x400000;

    .text : {
        simple-relocation.o(.text.*)
    }
    .rodata : { *(.rodata .rodata.*) }
    .data   : { *(.data .data.*) }
    .bss    : { *(.bss .bss.*) }

    /* 2. Move the location counter far away */
    . = 0x120000000;   /* ≈4.5 GiB */

    .text.far : {
        far-function.o(.text*)
    }
}

Now link with LLVM’s lld (its error messages are a bit clearer):

gcc simple-relocation.o far-function.o -T overflow.lds \
    -o simple-relocation-overflow -fuse-ld=lld

Output:

ld.lld: error: :(.eh_frame+0x6c):
relocation R_X86_64_PC32 out of range:
5364513724 is not in [-2147483648, 2147483647]; references section '.text'
ld.lld: error: simple-relocation.o:(function main: .text+0xa):
relocation R_X86_64_PLT32 out of range:
5364514572 is not in [-2147483648, 2147483647]; references 'far_function'
>>> referenced by simple-relocation.c
>>> defined in far-function.o

The linker reports a relocation overflow because the required displacement does not fit into a signed 32‑bit field.

Dealing with the Barrier

When we hit this problem we have several options, which fall under the broader topic of code models. The appropriate solution depends on whether we are accessing:

Data (static variables, constants)
Code (functions, jump targets)

A great, in‑depth discussion of these techniques can be found in the blog post “Relocation overflow and code models” by @maskray.

TL;DR

The x86‑64 CALL/JMP instructions use a 32‑bit signed relative offset, limiting direct jumps to ±2 GiB.
Massive static binaries can easily exceed this limit, causing relocation overflow errors at link time.
Solutions involve using different code models (e.g., small, medium, large, or PIE), indirect jumps, trampolines, or dynamic linking to keep all call targets within reach.

Understanding and working around the 2 GiB relocation barrier is essential when dealing with the mega‑codebases that produce binaries tens of gigabytes in size.

com/maskray — the author of lld.

The simplest solution, however, is to use -mcmodel=large, which changes all the relative CALL instructions to absolute JMP.

# Build the overflow example
gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

# Compile with the large code model
gcc -c simple-relocation.c -o simple-relocation.o -mcmodel=large -fno-asynchronous-unwind-tables

# Link again
gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

# Run
./simple-relocation-overflow

Note
I needed to add -fno-asynchronous-unwind-tables to disable some additional data that might cause overflow for the purpose of this demonstration.

What does the disassembly look like now?

objdump -dr simple-relocation-overflow

0000000120000000 :
  120000000: 55                    push   %rbp
  120000001: 48 89 e5              mov    %rsp,%rbp
  120000004: 90                    nop
  120000005: 5d                    pop    %rbp
  120000006: c3                    ret

00000000004000e6 :
  4000e6: 55                    push   %rbp
  4000e7: 48 89 e5              mov    %rsp,%rbp
  4000ea: b8 00 00 00 00        mov    $0x0,%eax
  4000ef: 48 ba 00 00 00 20 01  movabs $0x120000000,%rdx
  4000f6: 00 00 00 
  4000f9: ff d2                 call   *%rdx
  4000fb: b8 00 00 00 00        mov    $0x0,%eax
  400100: 5d                    pop    %rbp
  400101: c3                    ret

There is no longer a sole CALL instruction; it has become a MOVABS followed by a CALL 😲. This changes the instruction size from 5 bytes (opcode + 4‑byte relative offset) to a whopping 12 bytes (2‑byte ABS opcode + 8‑byte absolute address + 2‑byte CALL).

Notable downsides

Instruction bloat – We’ve gone from 5 bytes per call to 12. In a binary with millions of call sites, this can add up quickly.
Register pressure – We’ve burned a general‑purpose register (%rdx) to perform the jump.

Caution
I had a lot of trouble building a benchmark that demonstrated a worse lower IPC (instructions‑per‑cycle) for the large mcmodel, so let’s just take my word for it. 🤷

We would like to keep our small code model. What other strategies can we pursue?

More to come in subsequent writings.

Huge Binaries

Why 2 GiB? 🤔

A simple example

Adding the callee

The 2 GiB Barrier

What happens when the target is farther than 2 GiB?

Dealing with the Barrier

TL;DR

What does the disassembly look like now?

Notable downsides

Related posts

Binaries

Don't Trust, Verify: Building End-to-End Confidential Applications on Google Cloud

Building agents with the ADK and the new Interactions API

Don't Trust, Verify: Building End-to-End Confidential Applications on Google Cloud

Why 2 GiB? 🤔

A simple example

Adding the callee

The 2 GiB Barrier

What happens when the target is farther than 2 GiB?

Dealing with the Barrier

TL;DR

What does the disassembly look like now?

Notable downsides

Related posts

Binaries

Don't Trust, Verify: Building End-to-End Confidential Applications on Google Cloud

Building agents with the ADK and the new Interactions API

Don't Trust, Verify: Building End-to-End Confidential Applications on Google Cloud

Why 2 GiB? 🤔

The 2 GiB Barrier

What happens when the target is farther than 2 GiB?