Separate Stack for separate Thread.

Published: 0 month ago (January 10, 2026 at 03:31 AM EST)

7 min read

Source: Dev.to

Core Concept

Each thread needs its own stack, but threads within a process share code, data, and the heap. The OS kernel orchestrates everything.

What is a Process?

A process is a running program isolated from other processes.
When you open Chrome, Spotify, etc., the OS creates a process. It starts with one thread (the main thread) and has its own memory space. The kernel creates a Process Control Block (PCB) to track everything about the process.

What is a Thread?

A thread is an execution path within a process.
Multiple threads can exist in one process and share code and data, but each thread must have its own stack. The kernel creates a Thread Control Block (TCB) for each thread to track its state.

Process Memory Layout

Segment	Type	Shared?	Notes
Code	Compiled program instructions	✅ All threads	Read‑only, fixed size
Data	Global variables	✅ All threads	Fixed size, initialized
Heap	Dynamic memory (`malloc`/`new`)	✅ All threads	Grows upward, managed by programmer
Stack	Local variables, function calls	❌ Per thread	Grows downward, isolated per thread

Why Separate Stacks?

If two threads shared one stack, their function calls would collide and corrupt each other’s data.

Thread A calls a function → creates a stack frame.
Thread B calls a function → adds its frame to the same stack → frames overlap → data is overwritten.

When Thread B returns and pops its frame, Thread A’s data becomes corrupted.

Solution: each thread gets its own stack for its function calls and local variables. This allows threads to execute different functions simultaneously without interference.

Stack Frames and LIFO

Stack frames follow LIFO (Last‑In‑First‑Out) order:

main() → functionA() → functionB()

main() calls functionA() → a new frame is pushed.
functionA() calls functionB() → another frame is pushed.
functionB() returns → its frame is popped, revealing functionA()’s frame.
functionA() returns → its frame is popped, returning to main().

Each thread has its own stack, so multiple threads can call the same functions simultaneously without interfering with each other’s frames.

Thread Creation: Main vs. Sub‑threads

When a process starts, the OS automatically creates one thread – the main thread – which begins execution at main().
The main thread can create additional threads with pthread_create(). These are called sub‑threads.
After creation, all threads are peers in the eyes of the kernel; the main thread no longer has any special authority.

Thread Hierarchy

Process (created by OS)
    ↓
Main Thread (created automatically by kernel)
    ├─→ Sub‑thread 1 (created by main thread)
    ├─→ Sub‑thread 2 (created by any thread)
    └─→ Sub‑thread 3 (created by any thread)

Each sub‑thread receives:

its own stack (allocated by the kernel)
its own TCB (Thread Control Block)
shared access to the process’s code, data, and heap

Sub‑threads can also create more sub‑threads if needed.

Thread hierarchy diagram

Music Player Example

Thread	Responsibility
Main thread	Handles UI and user interactions
Audio thread	Decodes and plays audio continuously
I/O thread	Loads songs from the file system
Timer thread	Updates the playback timer display

All threads share the player’s code and data (song list, settings, etc.) but each has its own stack for local variables and function execution. The kernel rapidly switches between them, giving each a time slice, which makes the tasks appear to run simultaneously.

The OS Kernel: Central Orchestrator

The kernel is the central manager of the operating system. It handles all resource allocation and coordination for processes and threads.

Kernel diagram

Process Management

The kernel creates a Process Control Block (PCB) for every process.
The PCB stores the process ID, state (running, waiting, ready, etc.), memory layout, file descriptors, signal handlers, and other metadata.
This information lets the kernel manage, isolate, and secure each process.

Thread Management

For each thread, the kernel creates a Thread Control Block (TCB).
The TCB holds the thread ID, state, CPU registers (program counter, stack pointer), stack address and size, thread‑local storage, and scheduling information.
The kernel uses the TCB to schedule and manage threads correctly.

Memory Management

The kernel allocates memory for thread stacks using system calls such as mmap().
It reserves virtual address space for each stack, ensuring that stacks do not overlap and that each thread has a safe area for its frames.

Thread Stack

The kernel creates guard pages at the stack boundaries to detect stack‑overflow conditions.
When a thread actually uses its stack, the kernel allocates physical memory on demand.

CPU Scheduling

The kernel decides which thread runs on which CPU core and for how long.
It uses scheduling algorithms to fairly distribute CPU time among all threads.
When a thread’s time slice expires or it needs to wait for I/O, the kernel performs a context switch.

Context Switching

During a context switch the kernel:

Saves the current thread’s CPU state (Program Counter, Stack Pointer, and all CPU registers) into the thread’s TCB.
Loads the next thread’s saved state from its TCB into the CPU registers.
Resumes execution from the restored Program Counter, effectively transferring control to the next thread.

The resumed thread continues as if it had never been interrupted.

Default Stack Size by OS

(These values refer to virtual address space, not physical RAM.)

OS / Runtime	Default Stack Size	Remarks
Linux	8 MB per thread	Reserved virtual space; physical RAM is allocated only as needed.
Windows	1 MB per thread	Balances stability with resource usage.
macOS	512 KB per thread	More conservative with memory.
Java	1 MB per thread	Managed by the JVM.
Go	≈ 2 KB per goroutine	Goroutine stacks grow/shrink dynamically; the Go runtime, not the OS, manages them.

Why Different Sizes?

Linux (8 MB) – Supports deep recursion and large local variables in complex applications.
Windows (1 MB) – Provides a compromise between stability and memory consumption.
macOS (512 KB) – Reflects a design philosophy focused on resource efficiency.
Go (≈ 2 KB) – Tiny stacks work because the Go runtime handles memory and context switching far more efficiently than the kernel.

Virtual vs. Physical Memory

When the kernel reserves 8 MB of stack space for a thread, it only reserves virtual address space.
Physical RAM is allocated on demand (demand paging).

Example: If a thread uses only 100 KB of its 8 MB stack, only ~100 KB of RAM is actually consumed; the remaining 7.9 MB stays unused.

This approach lets the kernel allocate many stacks without wasting memory, even on systems with hundreds or thousands of threads.

Kernel Data Structures

Process Control Block (PCB)

Process ID
Process state
Memory layout (code, data, heap, stacks)
File descriptor table
Signal handlers
List of all threads in the process

The PCB lets the kernel manage a process’s lifecycle and resources.

Thread Control Block (TCB)

Thread ID
Thread state
CPU registers (PC, SP, etc.)
Address and size of the thread’s user‑mode stack
Thread‑local storage information
Scheduling priority
Pointer to the kernel stack

The TCB is used during context switches to save and restore thread state.

Kernel Stack vs. User Stack

Stack	Purpose	Location
User‑mode stack	Holds local variables, function parameters, return addresses for user code.	Process’s virtual address space (e.g., 8 MB on Linux).
Kernel‑mode stack	Used by the kernel while handling system calls, interrupts, etc.	Kernel memory (protected from user code).

When a thread invokes a system call (e.g., read() or write()), the CPU switches to kernel mode, loads the thread’s kernel‑stack pointer, and executes the kernel code on that stack. After the call finishes, the CPU returns to user mode and restores the user‑mode stack pointer.

Key Takeaways

Processes are isolated – each has its own protected memory space.
Threads share resources – code, data, and heap are common to all threads in a process.
Each thread has its own stack – the only per‑thread memory region inside a process.
Main thread is special only initially – it starts at main(), but all threads become peers after creation.
Thread creation – performed via pthread_create() (or equivalent APIs).
Kernel allocates all stacks – default sizes vary by OS (8 MB Linux, 1 MB Windows, 512 KB macOS).
Kernel manages scheduling – decides which thread runs, for how long, and on which CPU core.
Context switching enables multitasking – rapid switches give the illusion of parallel execution on a single core.
Stack frames implement LIFO – function calls push frames; returns pop them.
Demand paging saves RAM – virtual stack space consumes physical memory only when actually used.
TCBs track all thread state – the kernel uses TCBs to manage threads throughout their lifetimes.

Understanding This Matters

Understanding how threads work at the operating system level is fundamental to computer science. It reveals how your code actually executes at the hardware level. You’re not just using threading APIs—you’re understanding the real mechanisms that make concurrent programming possible. This foundation is essential for building efficient, safe multithreaded applications and for grasping more advanced concurrency concepts like goroutines in Go.