Decoding Exception Entry & Exit on ARM Cortex-Mx
Source: Dev.to
Introduction – Why This Post Exists
Interrupt handling on ARM Cortex‑M x looks simple on paper, but it becomes confusing the moment you open a debugger.
- PC value changes mysteriously
- Registers appear in stack memory
- LR holds strange values like
0xFFFFFFFD - Some registers never show up on the stack
This post breaks down what the hardware actually does, what the compiler does, and what the debugger hides, using real debugging screenshots and memory inspection.
Before diving into stack dumps and registers, it’s important to understand one thing:
The core decides when an interrupt is taken, saves a fixed architectural context, and switches modes and stacks automatically. Software only comes into play after that.
What is STIR?
- STIR does not directly jump to the ISR.
- It only sets the pending bit for that interrupt.
- STIR behaves exactly like a hardware interrupt line going high — it merely marks the interrupt as pending.
Step‑by‑Step Exception Entry
- NVIC sets the pending bit in the
NVIC_ISERregister. - CPU completes the currently executing instruction.
- Stacking (pushing the contents of the registers onto the stack) and vector fetching (reading the handler address from the vector table).
- CPU:
- Switches to Handler mode.
- Sets the Active bit in
NVIC_IABR. - Clears the Pending bit.
- ISR starts executing.
- MSP is used for all stack operations inside the handler.
The stacking of registers, mode switch, and vector fetch are performed internally by the core between two instructions, which is why these steps are not visible during source‑level single‑step debugging.
Common Misconception When Writing to STIR
| Observation | Reality |
|---|---|
| “The write to STIR causes a direct jump to the ISR.” | The write only sets the pending bit. |
| “The interrupt is taken immediately after the write.” | The core must finish the current instruction first. |
| “Pending = taken.” | Pending means eligible; taken means the core has entered the ISR. |
After Setting the Pending Bit
- The Cortex‑M core always completes the currently executing instruction.
- Interrupts are recognized only at instruction boundaries, never in the middle of an instruction.
- This guarantees precise and deterministic program execution.
Consequences
- The Program Counter (PC) continues to update for the instruction that was already in progress.
- The debugger may highlight the next C statement in the source view, making it look as if execution is proceeding normally.
- Only after the current instruction finishes does exception entry occur.
Debugger Example
In the screenshot below the debugger is stopped at a printf statement:
- The interrupt is still pending.
- CPU finishes the current instruction, PC updates, then exception entry occurs.
At this point:
- Exception entry has completed.
- The processor is now executing the Interrupt Service Routine (ISR) in Handler mode.
- PC has been loaded from the vector table.
- LR contains an
EXC_RETURNvalue. - MSP is active.
- The interrupt is no longer pending, and the corresponding NVIC active bit is set.
Inspecting the Stack Frame
When the ISR runs we can look at the stack memory to see what context the processor automatically saved.
Registers automatically stacked
xPSR, PC, LR, R12, R3, R2, R1, R0
Initial Stack Pointer
- Before the interrupt was serviced, SP = 0x2001FFE8.
- The stack is Full Descending (grows toward lower addresses; SP always points to the last stacked item).
Hardware Stacking Sequence
| Step | SP after decrement | Register stored |
|---|---|---|
| 1 | 0x2001FFE4 | xPSR |
| 2 | 0x2001FFE0 | PC |
| 3 | 0x2001FFDC | LR |
| 4 | 0x2001FFD8 | R12 |
| 5 | 0x2001FFD4 | R3 |
| 6 | 0x2001FFD0 | R2 |
| 7 | 0x2001FFCC | R1 |
| 8 | 0x2001FFC8 | R0 |
- After exception entry, SP = 0x2001FFC8, pointing to the last stacked register (R0).
- Example:
R0 = 0x0A– verified in both the register view and at memory address0x2001FFC8. - The value at
0x2001FFE4corresponds to xPSR, confirming the layout matches the ARM Cortex‑M specification.
Why Do We Only See R0–R3, R12, LR, PC, and xPSR on the Stack?
At first glance it looks like something is missing, but ARM deliberately treats registers differently based on who is responsible for preserving them.
Volatile (Caller‑Saved) Registers
| Register | Typical Use |
|---|---|
| R0–R3, R12 | Function arguments, temporary calculations, short‑lived values |
- These registers are expected to change often.
- If an interrupt occurs, these values are likely temporary, so the hardware must preserve them.
- Hence the Cortex‑M core automatically saves them during exception entry.
Non‑Volatile (Callee‑Saved) Registers
| Register | Typical Use |
|---|---|
| R4–R11 | Local variables, loop counters, pointers, structures, values that must survive across many instructions |
- Software is responsible for preserving these registers.
- The compiler will generate code to push/pop R4–R11 only if the ISR actually uses them.
- If the ISR doesn’t need them, they are never pushed, saving stack space and time.
Why ARM Designed It This Way
- Low interrupt latency – minimal work is done automatically.
- Minimal stack usage – only the essential registers are saved.
- Predictable timing – the hardware‑defined stack frame is fixed and fast.
- Fast context switching – the core can enter/exit an ISR with a few cycles.
- Compiler flexibility – the compiler handles the rest, pushing only what the ISR really needs.
TL;DR
- STIR → pending bit (no immediate jump).
- Core finishes current instruction, then performs exception entry (hardware stacking, mode switch, vector fetch).
- Hardware automatically saves R0‑R3, R12, LR, PC, xPSR.
- Software (compiler) saves R4‑R11 only if required.
- The debugger may hide these internal steps, making the flow appear odd, but the sequence is deterministic and documented in the ARM Cortex‑M architecture manual.
Exception Return on Cortex‑M x
Why hardware doesn’t save all registers every time
If the hardware saved all registers on every exception, Cortex‑M x would be much slower and far less suitable for real‑time systems.
What is EXC_RETURN?
- A special value placed in LR (link register) during exception entry.
- Writing this value to PC triggers an exception return.
- It is not a normal return address; it tells the processor how to return from the exception.
Typical return instructions that use the value in LR:
BX LR
POP {PC}
LDR PC, [addr]
Important note – Unlike a normal C function call, the exception mechanism stores the special value EXC_RETURN in LR.
EXC_RETURN Encoding
All EXC_RETURN values have bits [31:5] = 1.
Only the lower few bits describe the return behavior; the processor decodes them automatically.
| Bit | Description | Value / Meaning |
|---|---|---|
| [31:5] | EXC_RETURN signature | Always 1 → identifies an exception return value |
| 4 | Floating‑point context | 1 → No FP context stacked 0 → FP context stacked (only if FPU present) |
| 3 | Return mode | 1 → Return to Thread mode 0 → Return to Handler mode |
| 2 | Stack pointer selection | 1 → Use PSP (Process Stack Pointer) 0 → Use MSP (Main Stack Pointer) |
| 1 | Reserved | Always 0 |
| 0 | Reserved | Always 1 |
What Happens During Exception Entry
- The Cortex‑M processor performs stacking and vector fetching in hardware.
- The Program Counter (PC) appears to change suddenly – you don’t see the individual stacking steps.
- The memory view shows the saved registers, even though the source view does not.
In short, the debugger shows the result of exception entry, not the hardware steps that caused it.
How I Verified the Behavior
| Method | Observation |
|---|---|
STIR to trigger interrupts | Confirmed hardware‑initiated exception entry |
| Debugger register view | Registers are saved correctly |
| Stack memory inspection | Only the fixed exception frame is saved; SP moves exactly 32 bytes |
| Compiler output inspection | R4–R11 are saved only when required |
Bottom line: EXC_RETURN is a compact, hardware‑generated token that tells the Cortex‑M core how to unwind an exception (which stack pointer to use, which mode to return to, and whether floating‑point state is present). The processor handles all low‑level stacking/unstacking automatically, and the debugger reflects the final stacked state rather than each individual hardware step.