Inside the CPU: A Complete Guide to the Instruction Execution Cycle and How Data Is Retrieved from RAM
Source: Dev.to
What Is the CPU Execution Cycle?
The CPU (Central Processing Unit) processes instructions using a repeated loop called the Instruction Cycle or Fetch–Decode–Execute (FDE) cycle. Every instruction—whether it’s ADD, MOV, LOAD, a function call, or a branch—passes through this cycle.
The three major phases are:
- Fetch – Get the instruction from memory (RAM or cache).
- Decode – Understand what the instruction means.
- Execute – Perform the operation (ALU math, memory access, branch, etc.).
This cycle repeats billions of times per second.
CPU Architecture at a Glance
Registers
Small, extremely fast storage units inside the CPU:
- PC (Program Counter) – Holds the address of the next instruction.
- IR (Instruction Register) – Holds the instruction being decoded/executed.
- MAR (Memory Address Register) – Holds memory addresses.
- MDR (Memory Data Register) – Holds data being transferred to/from memory.
- General Registers – AX, BX, RAX, RBX, etc., depending on architecture.
ALU (Arithmetic Logic Unit)
Performs math (add, subtract, multiply) and logical operations (AND, OR, XOR).
Control Unit
Directs the entire execution cycle, orchestrating fetch, decode, and execute.
Cache
Very fast memory layers (L1, L2, L3) that speed up instruction and data access.
RAM (Main Memory)
Where instructions and program data reside while a program runs.
Step‑by‑Step Guide to the Instruction Cycle
Step 1: Fetch
- PC → MAR – The Program Counter address is copied into the Memory Address Register.
- CPU requests memory read – The address is placed on the address bus.
- Memory returns the instruction → MDR – Data from memory goes into the Memory Data Register.
- MDR → IR – The fetched instruction is placed in the Instruction Register.
- PC increments –
PC = PC + size_of_instruction(typically +1, +4, etc.).
At this point the CPU has loaded the instruction and is ready to decode.
Step 2: Decode
The Control Unit interprets the instruction:
- Opcode – Operation (add, move, jump).
- Operands – Register numbers, memory addresses, constants.
- Addressing modes – Direct, indirect, immediate, indexed.
No data processing happens yet—only interpretation.
Step 3: Execute
The CPU performs the action, which can take several forms:
- Arithmetic / Logical Operation – The ALU carries out the operation; the result is stored in a register.
- Memory Read (LOAD)
- Effective address is calculated.
- Address placed into MAR.
- Control unit sends a read request to memory.
- Data moves over the data bus into MDR.
- MDR → target register.
- Memory Write (STORE) – Data placed into MDR, address into MAR, then memory is signaled to write.
- Branch – If the condition is true, PC is updated to a new address; otherwise, PC continues normally.
- I/O – Control signals activate I/O controllers.
After execution, the cycle restarts with the next instruction.
How the CPU Actually Gets Data from RAM
Step 1: Cache Check
The CPU first looks in the cache hierarchy (L1 → L2 → L3).
- Cache hit – Data is read instantly (nanoseconds).
- Cache miss – The CPU must fetch the data from RAM, which is slower.
Step 2: Send Address to RAM
The address is placed on the Address Bus via the MAR.
Step 3: RAM Responds
RAM places the requested data on the Data Bus.
Step 4: Data Arrives in MDR
The CPU captures the data into the Memory Data Register.
Step 5: Transfer to Target Register
Example assembly:
LOAD R1, [0x5000]
The final step is MDR → R1.
Putting It All Together: A Real Example
ADD R1, [0x2000]
- Fetch – Retrieve the instruction at the address in PC.
- Decode – Identify opcode
ADDand operands (registerR1+ memory location0x2000). - Execute
- Calculate effective address
0x2000. MAR = 0x2000.- RAM → MDR (memory read).
- ALU adds
R1+MDR. - Store result back into
R1.
- Calculate effective address
All of this occurs within a few CPU cycles.
Pipeline and Parallelism (Modern CPUs)
Modern CPUs improve throughput with techniques such as:
- Instruction pipelining – Overlapping fetch, decode, and execute stages for multiple instructions.
- Superscalar execution – Issuing several instructions per clock cycle.
- Out‑of‑order execution – Reordering instructions to avoid stalls.
- Branch prediction – Guessing the outcome of branches to keep the pipeline full.
These optimizations enable billions of operations per second.
Summary
Fetch: PC → MAR → Memory → MDR → IR → PC++
Decode: Control Unit interprets opcode and operands.
Execute:
- ALU operations
- Memory Load/Store
- Branching
- I/O operations
Memory Access: Always passes through cache → (possible) RAM → MDR → Register.
Understanding this cycle provides a deep view of how your code runs at the hardware level, essential for system design, assembly programming, compiler construction, and performance engineering.