Binaries
Source: Hacker News
The 2 GiB “Relocation Barrier” – Why Massive Binaries Break on x86‑64
A problem I ran into while pursuing my PhD and submitting academic articles was that I had built solutions to problems that required dramatic scale to be effective and worthwhile.
Responses to my publication submissions often claimed such problems did not exist; however, I had observed them during my time in industry (e.g., at Google), but I couldn’t cite them!
One problem that is only present in these mega‑codebases is massive binaries.
What’s the largest ELF binary you’ve ever seen? I have observed binaries beyond 25 GiB, including debug symbols. How is this possible?
These companies prefer to statically link their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.
Similar to the sound barrier, there is a point at which code size becomes problematic and we must rethink how we link and build code. For x86‑64, that point is the 2 GiB “Relocation Barrier.”
Why 2 GiB? 🤔
Let’s take a look at how position‑independent code is put together.
A simple example
/* simple-relocation.c */
extern void far_function();
int main(void) {
far_function();
return 0;
}
Compile it:
gcc -c simple-relocation.c -o simple-relocation.o
Inspect the object file with objdump:
> objdump -dr simple-relocation.o
0000000000000000 :
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 00 00 00 00 mov $0x0,%eax
9: e8 00 00 00 00 call e
a: R_X86_64_PLT32 far_function-0x4
e: b8 00 00 00 00 mov $0x0,%eax
13: 5d pop %rbp
14: c3 ret
e8is the CALL opcode (takes a 32‑bit signed relative offset).- The operand is currently
00 00 00 00because the actual address isn’t known yet. objdumpshows a relocation entry that the linker must fix later.
Note
The-0x4is needed because the offset is relative to the instruction pointer after it has advanced past the 4‑byte operand.
Show the relocation with readelf:
readelf -r simple-relocation.o -d
Relocation section '.rela.text' at offset 0x170 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000a 000400000004 R_X86_64_PLT32 0000000000000000 far_function - 4
The entry tells the linker that the 4‑byte operand at offset 0x0a (the start of the CALL’s immediate) must be patched with the address of far_function.
Adding the callee
/* far-function.c */
void far_function(void) {
}
Compile and link:
gcc -c far-function.c -o far-function.o
gcc simple-relocation.o far-function.o -o simple-relocation
Inspect the final executable:
> objdump -dr simple-relocation
0000000000401106 :
401106: 55 push %rbp
401107: 48 89 e5 mov %rsp,%rbp
40110a: b8 00 00 00 00 mov $0x0,%eax
40110f: e8 07 00 00 00 call 40111b
401114: b8 00 00 00 00 mov $0x0,%eax
401119: 5d pop %rbp
40111a: c3 ret
000000000040111b :
40111b: 55 push %rbp
40111c: 48 89 e5 mov %rsp,%rbp
40111f: 90 nop
401120: 5d pop %rbp
401121: c3 ret
The linker has calculated the relative offset (0x07) and patched the CALL instruction.
The 2 GiB Barrier
The CALL opcode (e8) only takes a 32‑bit signed displacement, i.e. it can reach ±2 GiB (‑2³¹ … +2³¹‑1).
Thus a callsite can jump at most about 2 GiB forward or backward. This limit is the “2 GiB Barrier.”
What happens when the target is farther away?
We can force the linker to place far_function far from main using a linker script.
/* overflow.lds */
SECTIONS
{
/* 1. Standard low‑address sections */
. = 0x400000;
.text : {
simple-relocation.o(.text.*)
}
.rodata : { *(.rodata .rodata.*) }
.data : { *(.data .data.*) }
.bss : { *(.bss .bss.*) }
/* 2. Move the location counter far away for the “far” island */
. = 0x120000000; /* ≈ 4.5 GiB */
.text.far : {
far-function.o(.text*)
}
}
Now link with LLVM’s lld (its error messages are a bit clearer):
gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow -fuse-ld=lld
Result:
ld.lld: error: :(.eh_frame+0x6c):
relocation R_X86_64_PC32 out of range:
5364513724 is not in [-2147483648, 2147483647]; references section '.text'
ld.lld: error: simple-relocation.o:(function main: .text+0xa):
relocation R_X86_64_PLT32 out of range:
5364514572 is not in [-2147483648, 2147483647]; references 'far_function'
>>> referenced by simple-relocation.c
>>> defined in far-function.o
The linker reports a relocation overflow because the required displacement does not fit into a signed 32‑bit field.
How to Deal with the Barrier?
This is a whole other subject involving code models (small, kernel, medium, large) and the distinction between code and data references.
In short:
| Situation | Typical solution |
|---|---|
| Calls or jumps > 2 GiB away | Use indirect calls/jumps (e.g., via a register or PLT entry) or compile with a large code model (-mcmodel=large). |
| Accessing static data > 2 GiB away | Use RIP‑relative addressing with a large code model, or load the address into a register first. |
| Mixing static and dynamic linking | Rely on the dynamic linker’s PLT/GOT mechanisms, which automatically generate indirections. |
A great, in‑depth discussion of these topics can be found in the blog post “Relocation overflow and code models” by @maskray:
https://maskray.me/blog/2023-05-14-relocation-overflow-and-code-models
(If the link is broken, search for the title and author.)
Take‑aways
- Static linking of gigantic codebases can easily produce binaries that exceed the 2 GiB reach of a single relative jump.
- The x86-64 CALL/JMP instructions use a signed 32‑bit displacement, limiting direct jumps to ±2 GiB.
- When the linker cannot fit a displacement into that range, you get a relocation overflow.
- The usual remedies are indirect calls/jumps, different code models, or dynamic linking (PLT/GOT).
Understanding this “relocation barrier” is essential when designing build systems for mega‑binaries (tens of gigabytes) and for deciding whether static linking is truly the right choice for a given organization.
Using -mcmodel=large to Avoid Relocation Overflows
The simplest solution, however, is to compile with -mcmodel=large, which changes all relative CALL instructions to absolute jumps (JMP).
# Build the executable
gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow
# Compile with the large code model
gcc -c simple-relocation.c -o simple-relocation.o -mcmodel=large -fno-asynchronous-unwind-tables
# Link again (same as before)
gcc simple-relocation.o far-function.o -T overflow.lds -o simple-relocation-overflow
# Run
./simple-relocation-overflow
Note
-fno-asynchronous-unwind-tablesis required to disable extra unwind‑table data that could otherwise cause an overflow in this demonstration.
Disassembly After Switching to -mcmodel=large
objdump -dr simple-relocation-overflow
0000000120000000 :
120000000: 55 push %rbp
120000001: 48 89 e5 mov %rsp,%rbp
120000004: 90 nop
120000005: 5d pop %rbp
120000006: c3 ret
00000000004000e6 :
4000e6: 55 push %rbp
4000e7: 48 89 e5 mov %rsp,%rbp
4000ea: b8 00 00 00 00 mov $0x0,%eax
4000ef: 48 ba 00 00 00 20 01 movabs $0x120000000,%rdx
4000f6: 00 00 00
4000f9: ff d2 call *%rdx
4000fb: b8 00 00 00 00 mov $0x0,%eax
400100: 5d pop %rbp
400101: c3 ret
The single CALL instruction has been replaced by a MOVABS followed by a CALL — the instruction size grew from 5 bytes (opcode + 4‑byte relative offset) to 12 bytes (2‑byte MOVABS opcode + 8‑byte absolute address + 2‑byte CALL).
Downsides of the Large Code Model
- Instruction bloat – each call now occupies 12 bytes instead of 5. In a binary with many call sites, this can increase code size noticeably.
- Register pressure – an extra general‑purpose register (
%rdxin the example) is consumed to hold the absolute address.
Caution
I had difficulty constructing a benchmark that showed a measurable drop in IPC (instructions per cycle) for the largemcmodel. Take my word for it that the impact can be non‑trivial. 🤷
Keeping the Small Code Model
If you want to stay with the small code model, you’ll need to explore alternative strategies (e.g., reorganising sections, using trampolines, or splitting the binary). More ideas will be covered in future posts.