Inside the CPU: A Complete Guide to the Instruction Execution Cycle and How Data Is Retrieved from RAM

Published: 1 week ago (December 7, 2025 at 10:27 PM EST)

4 min read

Source: Dev.to

What Is the CPU Execution Cycle?

The CPU (Central Processing Unit) processes instructions using a repeated loop called the Instruction Cycle or Fetch–Decode–Execute (FDE) cycle. Every instruction—whether it’s ADD, MOV, LOAD, a function call, or a branch—passes through this cycle.

The three major phases are:

Fetch – Get the instruction from memory (RAM or cache).
Decode – Understand what the instruction means.
Execute – Perform the operation (ALU math, memory access, branch, etc.).

This cycle repeats billions of times per second.

CPU Architecture at a Glance

Registers

Small, extremely fast storage units inside the CPU:

PC (Program Counter) – Holds the address of the next instruction.
IR (Instruction Register) – Holds the instruction being decoded/executed.
MAR (Memory Address Register) – Holds memory addresses.
MDR (Memory Data Register) – Holds data being transferred to/from memory.
General Registers – AX, BX, RAX, RBX, etc., depending on architecture.

ALU (Arithmetic Logic Unit)

Performs math (add, subtract, multiply) and logical operations (AND, OR, XOR).

Control Unit

Directs the entire execution cycle, orchestrating fetch, decode, and execute.

Cache

Very fast memory layers (L1, L2, L3) that speed up instruction and data access.

RAM (Main Memory)

Where instructions and program data reside while a program runs.

Step‑by‑Step Guide to the Instruction Cycle

Step 1: Fetch

PC → MAR – The Program Counter address is copied into the Memory Address Register.
CPU requests memory read – The address is placed on the address bus.
Memory returns the instruction → MDR – Data from memory goes into the Memory Data Register.
MDR → IR – The fetched instruction is placed in the Instruction Register.
PC increments – PC = PC + size_of_instruction (typically +1, +4, etc.).

At this point the CPU has loaded the instruction and is ready to decode.

Step 2: Decode

The Control Unit interprets the instruction:

Opcode – Operation (add, move, jump).
Operands – Register numbers, memory addresses, constants.
Addressing modes – Direct, indirect, immediate, indexed.

No data processing happens yet—only interpretation.

Step 3: Execute

The CPU performs the action, which can take several forms:

Arithmetic / Logical Operation – The ALU carries out the operation; the result is stored in a register.
Memory Read (LOAD)
1. Effective address is calculated.
2. Address placed into MAR.
3. Control unit sends a read request to memory.
4. Data moves over the data bus into MDR.
5. MDR → target register.
Memory Write (STORE) – Data placed into MDR, address into MAR, then memory is signaled to write.
Branch – If the condition is true, PC is updated to a new address; otherwise, PC continues normally.
I/O – Control signals activate I/O controllers.

After execution, the cycle restarts with the next instruction.

How the CPU Actually Gets Data from RAM

Step 1: Cache Check

The CPU first looks in the cache hierarchy (L1 → L2 → L3).

Cache hit – Data is read instantly (nanoseconds).
Cache miss – The CPU must fetch the data from RAM, which is slower.

Step 2: Send Address to RAM

The address is placed on the Address Bus via the MAR.

Step 3: RAM Responds

RAM places the requested data on the Data Bus.

Step 4: Data Arrives in MDR

The CPU captures the data into the Memory Data Register.

Step 5: Transfer to Target Register

Example assembly:

LOAD R1, [0x5000]

The final step is MDR → R1.

Putting It All Together: A Real Example

ADD R1, [0x2000]

Fetch – Retrieve the instruction at the address in PC.
Decode – Identify opcode ADD and operands (register R1 + memory location 0x2000).
Execute
1. Calculate effective address 0x2000.
2. MAR = 0x2000.
3. RAM → MDR (memory read).
4. ALU adds R1 + MDR.
5. Store result back into R1.

All of this occurs within a few CPU cycles.

Pipeline and Parallelism (Modern CPUs)

Modern CPUs improve throughput with techniques such as:

Instruction pipelining – Overlapping fetch, decode, and execute stages for multiple instructions.
Superscalar execution – Issuing several instructions per clock cycle.
Out‑of‑order execution – Reordering instructions to avoid stalls.
Branch prediction – Guessing the outcome of branches to keep the pipeline full.

These optimizations enable billions of operations per second.

Summary

Fetch: PC → MAR → Memory → MDR → IR → PC++
Decode: Control Unit interprets opcode and operands.
Execute:

ALU operations
Memory Load/Store
Branching
I/O operations

Memory Access: Always passes through cache → (possible) RAM → MDR → Register.

Understanding this cycle provides a deep view of how your code runs at the hardware level, essential for system design, assembly programming, compiler construction, and performance engineering.