Binaries

Published: (December 29, 2025 at 12:35 AM EST)
6 min read

Source: Hacker News

The 2 GiB “Relocation Barrier” – Why Massive Binaries Break on x86‑64

A problem I ran into while pursuing my PhD and submitting academic articles was that I had built solutions to problems that required dramatic scale to be effective and worthwhile.
Responses to my publication submissions often claimed such problems did not exist; however, I had observed them during my time in industry (e.g., at Google), but I couldn’t cite them!

One problem that is only present in these mega‑codebases is massive binaries.
What’s the largest ELF binary you’ve ever seen? I have observed binaries beyond 25 GiB, including debug symbols. How is this possible?

These companies prefer to statically link their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.

Similar to the sound barrier, there is a point at which code size becomes problematic and we must rethink how we link and build code. For x86‑64, that point is the 2 GiB “Relocation Barrier.”

Why 2 GiB? 🤔

Let’s take a look at how position‑independent code is put together.

A simple example

/* simple-relocation.c */
extern void far_function();

int main(void) {
    far_function();
    return 0;
}

Compile it:

gcc -c simple-relocation.c -o simple-relocation.o

Inspect the object file with objdump:

> objdump -dr simple-relocation.o

0000000000000000 :
   0: 55                    push   %rbp
   1: 48 89 e5              mov    %rsp,%rbp
   4: b8 00 00 00 00        mov    $0x0,%eax
   9: e8 00 00 00 00        call   e 
        a: R_X86_64_PLT32   far_function-0x4
   e: b8 00 00 00 00        mov    $0x0,%eax
  13: 5d                    pop    %rbp
  14: c3                    ret
  • e8 is the CALL opcode (takes a 32‑bit signed relative offset).
  • The operand is currently 00 00 00 00 because the actual address isn’t known yet.
  • objdump shows a relocation entry that the linker must fix later.

Note
The -0x4 is needed because the offset is relative to the instruction pointer after it has advanced past the 4‑byte operand.

Show the relocation with readelf:

readelf -r simple-relocation.o -d
Relocation section '.rela.text' at offset 0x170 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000400000004 R_X86_64_PLT32    0000000000000000 far_function - 4

The entry tells the linker that the 4‑byte operand at offset 0x0a (the start of the CALL’s immediate) must be patched with the address of far_function.

Adding the callee

/* far-function.c */
void far_function(void) {
}

Compile and link:

gcc -c far-function.c -o far-function.o
gcc simple-relocation.o far-function.o -o simple-relocation

Inspect the final executable:

> objdump -dr simple-relocation

0000000000401106 :
 401106: 55                    push   %rbp
 401107: 48 89 e5              mov    %rsp,%rbp
 40110a: b8 00 00 00 00        mov    $0x0,%eax
 40110f: e8 07 00 00 00        call   40111b 
 401114: b8 00 00 00 00        mov    $0x0,%eax
 401119: 5d                    pop    %rbp
 40111a: c3                    ret

000000000040111b :
 40111b: 55                    push   %rbp
 40111c: 48 89 e5              mov    %rsp,%rbp
 40111f: 90                    nop
 401120: 5d                    pop    %rbp
 401121: c3                    ret

The linker has calculated the relative offset (0x07) and patched the CALL instruction.

The 2 GiB Barrier

The CALL opcode (e8) only takes a 32‑bit signed displacement, i.e. it can reach ±2 GiB (‑2³¹ … +2³¹‑1).
Thus a callsite can jump at most about 2 GiB forward or backward. This limit is the “2 GiB Barrier.”

What happens when the target is farther away?

We can force the linker to place far_function far from main using a linker script.

/* overflow.lds */
SECTIONS
{
    /* 1. Standard low‑address sections */
    . = 0x400000;

    .text : {
        simple-relocation.o(.text.*)
    }
    .rodata : { *(.rodata .rodata.*) }
    .data   : { *(.data .data.*) }
    .bss    : { *(.bss .bss.*) }

    /* 2. Move the location counter far away for the “far” island */
    . = 0x120000000;   /* ≈ 4.5 GiB */

    .text.far : {
        far-function.o(.text*)
    }
}

Now link with LLVM’s lld (its error messages are a bit clearer):

gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow -fuse-ld=lld

Result:

ld.lld: error: :(.eh_frame+0x6c):
relocation R_X86_64_PC32 out of range:
5364513724 is not in [-2147483648, 2147483647]; references section '.text'
ld.lld: error: simple-relocation.o:(function main: .text+0xa):
relocation R_X86_64_PLT32 out of range:
5364514572 is not in [-2147483648, 2147483647]; references 'far_function'
>>> referenced by simple-relocation.c
>>> defined in far-function.o

The linker reports a relocation overflow because the required displacement does not fit into a signed 32‑bit field.

How to Deal with the Barrier?

This is a whole other subject involving code models (small, kernel, medium, large) and the distinction between code and data references.
In short:

SituationTypical solution
Calls or jumps > 2 GiB awayUse indirect calls/jumps (e.g., via a register or PLT entry) or compile with a large code model (-mcmodel=large).
Accessing static data > 2 GiB awayUse RIP‑relative addressing with a large code model, or load the address into a register first.
Mixing static and dynamic linkingRely on the dynamic linker’s PLT/GOT mechanisms, which automatically generate indirections.

A great, in‑depth discussion of these topics can be found in the blog post “Relocation overflow and code models” by @maskray:

https://maskray.me/blog/2023-05-14-relocation-overflow-and-code-models

(If the link is broken, search for the title and author.)

Take‑aways

  • Static linking of gigantic codebases can easily produce binaries that exceed the 2 GiB reach of a single relative jump.
  • The x86-64 CALL/JMP instructions use a signed 32‑bit displacement, limiting direct jumps to ±2 GiB.
  • When the linker cannot fit a displacement into that range, you get a relocation overflow.
  • The usual remedies are indirect calls/jumps, different code models, or dynamic linking (PLT/GOT).

Understanding this “relocation barrier” is essential when designing build systems for mega‑binaries (tens of gigabytes) and for deciding whether static linking is truly the right choice for a given organization.

Using -mcmodel=large to Avoid Relocation Overflows

The simplest solution, however, is to compile with -mcmodel=large, which changes all relative CALL instructions to absolute jumps (JMP).

# Build the executable
gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

# Compile with the large code model
gcc -c simple-relocation.c -o simple-relocation.o -mcmodel=large -fno-asynchronous-unwind-tables

# Link again (same as before)
gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow

# Run
./simple-relocation-overflow

Note
-fno-asynchronous-unwind-tables is required to disable extra unwind‑table data that could otherwise cause an overflow in this demonstration.

Disassembly After Switching to -mcmodel=large

objdump -dr simple-relocation-overflow
0000000120000000 :
  120000000: 55                    push   %rbp
  120000001: 48 89 e5              mov    %rsp,%rbp
  120000004: 90                    nop
  120000005: 5d                    pop    %rbp
  120000006: c3                    ret

00000000004000e6 :
  4000e6: 55                    push   %rbp
  4000e7: 48 89 e5              mov    %rsp,%rbp
  4000ea: b8 00 00 00 00        mov    $0x0,%eax
  4000ef: 48 ba 00 00 00 20 01  movabs $0x120000000,%rdx
  4000f6: 00 00 00 
  4000f9: ff d2                 call   *%rdx
  4000fb: b8 00 00 00 00        mov    $0x0,%eax
  400100: 5d                    pop    %rbp
  400101: c3                    ret

The single CALL instruction has been replaced by a MOVABS followed by a CALL — the instruction size grew from 5 bytes (opcode + 4‑byte relative offset) to 12 bytes (2‑byte MOVABS opcode + 8‑byte absolute address + 2‑byte CALL).

Downsides of the Large Code Model

  • Instruction bloat – each call now occupies 12 bytes instead of 5. In a binary with many call sites, this can increase code size noticeably.
  • Register pressure – an extra general‑purpose register (%rdx in the example) is consumed to hold the absolute address.

Caution
I had difficulty constructing a benchmark that showed a measurable drop in IPC (instructions per cycle) for the large mcmodel. Take my word for it that the impact can be non‑trivial. 🤷

Keeping the Small Code Model

If you want to stay with the small code model, you’ll need to explore alternative strategies (e.g., reorganising sections, using trampolines, or splitting the binary). More ideas will be covered in future posts.

Back to Blog

Related posts

Read more »

Deploy calculation game app

I deployed my calculation game app to Vercel—specifically designed for Japanese kids who aren't fond of math! Haha! https://flush-calc.vercel.app/https://flush-...