From ARM Assembly to Machine Code: A Bare-Metal Primer
Source: Dev.to
ARMv7 programmer‑visible registers
ARMv7 exposes sixteen general‑purpose registers (R0–R15) and several status registers.
| Register | Role |
|---|---|
| R0–R12 | General‑purpose data, parameters, temporaries |
| R13 (SP) | Stack Pointer |
| R14 (LR) | Link Register |
| R15 (PC) | Program Counter |
| CPSR | Current Program Status Register |
| SPSR | Saved Program Status Register |
The architecture also defines multiple processor modes (User, IRQ, FIQ, Supervisor, …). Some registers (e.g., SP and LR) are banked across modes, allowing fast exception entry without saving the full register set.
From assembly source to machine code
An ARM processor executes 32‑bit instruction words fetched from memory. The assembler translates human‑readable mnemonics into these words according to the ARM Instruction Set Architecture (ISA). For this primer we assume:
- All instructions are 32 bits wide and word‑aligned.
- The PC value seen by an instruction equals the address of the current instruction + 8 bytes.
A minimal startup example
// startup.s
ldr r2, str1 @ load literal into r2
b . @ infinite loop
str1: .word 0xDEADBEEF
This tiny program:
- Loads the 32‑bit constant
0xDEADBEEFintor2. - Enters an intentional infinite loop.
- Places the literal value in memory.
How labels and offsets are resolved
Because the source resides in a single section, the assembler can compute absolute addresses directly—no linker relocations are needed.
| Address | Source | Machine code |
|---|---|---|
| 0x00 | ldr r2, str1 | 0xE59F2000 |
| 0x04 | b . | 0xEAFFFFFE |
| 0x08 | str1: .word 0xDEADBEEF | 0xDEADBEEF |
Encoding ldr r2, str1
The LDR Rd, [PC, #offset] form uses PC‑relative addressing.
- PC during execution = instruction address + 8 →
0x00 + 0x08 = 0x08 - Target address = address of
str1=0x08 - Required offset =
0x08 – 0x08 = 0
| Field | Bits | Value (binary) | Meaning |
|---|---|---|---|
| Condition | 31‑28 | 1110 (E) | Always |
| Opcode | 27‑20 | 01011001 | LDR (immediate) |
| Base register (Rn) | 19‑16 | 1111 (PC) | — |
| Destination (Rd) | 15‑12 | 0010 (R2) | — |
| Offset | 11‑0 | 000000000000 | 0 |
Final instruction word: 0xE59F2000
Encoding b .
Branch instructions also use PC‑relative addressing.
- Instruction address =
0x04 - Effective PC =
0x04 + 0x08 = 0x0C - Target address (
.) =0x04 - Byte offset =
0x04 – 0x0C = -8→ in words:-8 / 4 = -2
The 24‑bit two’s‑complement representation of -2 is 0xFFFFFFFE.
Final instruction word: 0xEAFFFFFE
Why b . becomes an infinite loop
During execution:
- PC after fetching the branch =
0x0C. - Decoded offset =
-2words →-8bytes. - Branch target =
0x0C + (-8) = 0x04, i.e., the address of the branch itself.
Thus control returns to the same instruction repeatedly, forming a tight infinite loop that consumes no stack, registers, or additional memory.
Why this pattern is used in bare‑metal code
- Halting when an unrecoverable error occurs.
- Waiting for a debugger to attach.
- Placeholder during early bring‑up of hardware.
- Explicit end‑of‑program marker in minimal examples.
Encoding .word 0xDEADBEEF
.word is an assembler directive, not an executable instruction.
- It reserves 4 bytes at the current location.
- The literal
0xDEADBEEFis emitted verbatim. - The CPU will only interpret it as data unless execution jumps into that region.
How instructions appear in memory (little‑endian)
In little‑endian ARM, each 32‑bit word is stored byte‑reversed.
| Address offset | Byte |
|---|---|
| +0 | 00 |
| +1 | 20 |
| +2 | 9F |
| +3 | E5 |
The same ordering applies to all instructions and data words.
From source to raw binary
# Assemble and link
arm-none-eabi-as -o startup.o startup.s
arm-none-eabi-ld -o first-hang.elf startup.o
arm-none-eabi-objcopy -O binary first-hang.elf first-hang.bin
# Hex dump of the final binary
hexdump -C first-hang.bin
# 00000000 00 20 9f e5 fe ff ff ea ef be ad de |. ..........|
# 0000000c
# Interpreted as 32‑bit words (little‑endian)
xxd -e first-hang.bin
# 00000000: e59f2000 eafffffe deadbeef
The output matches the expected encoding derived earlier.