The Engine Under the Hood: Go’s GMP, Java’s Locks, and Erlang’s Heaps
Source: Dev.to
Introduction
As backend engineers we often treat concurrency as a black box: we write go func(){} or spawn() and expect magic. Understanding how the runtime schedules these tasks separates a senior engineer from an architect.
The GMP Scheduler
Go’s scheduler follows the G‑M‑P model:
| Component | Description |
|---|---|
| G (Goroutine) | Lightweight user‑space thread (starts at ~2 KB stack). Holds the instruction pointer and stack. |
| M (Machine) | An OS thread managed by the kernel. It is the actual worker that executes CPU instructions. |
| P (Processor) | A logical token that owns a local run queue and a portion of the memory cache. An M must hold a P to execute a G. |
- Rule: An M must have a P to run a G.
- P = logical cores: By default
GOMAXPROCSequals the number of CPU cores, limiting parallelism while allowing unlimited concurrency.
When is a G created?
A goroutine is created whenever you call go func(){}. It is allocated in user space by the Go runtime, costs ~2 KB, and is placed on the local run queue of the current P.
When is an M created?
The runtime keeps the M count low, spawning a new OS thread only when:
- A goroutine makes a blocking system call (e.g., CGO, heavy file I/O) that cannot be handled asynchronously.
- The current M gets stuck inside the OS kernel.
- Other Ps are waiting for work but no M is available (creating a new M is expensive, ~1–2 MB).
The Watcher: sysmon and SIGURG
What is sysmon?
sysmon (system monitor) is a special runtime thread that does not hold a P and runs on a dedicated M. It wakes up periodically (20 µs – 10 ms) to enforce fairness.
How preemption works
Since Go 1.14, the scheduler uses signals to force work stealing:
sysmonscans all Ps. If it finds a goroutine that has run on a processor for > 10 ms, it sends a SIGURG to the M executing that goroutine.- Why SIGURG?
- Out‑of‑band: rarely used by modern apps, so it doesn’t clash with user signals.
- Non‑destructive: unlike
SIGINT, it does not terminate the process. - Libc‑safe: safe for programs that use CGO.
- The OS interrupts the M; Go’s signal handler injects a call to
asyncPreemptonto the goroutine’s stack. - The goroutine yields, is moved to the global run queue, and the P picks a new G to run.
Model Comparison: “Communicate by Sharing Memory” vs. “Share Memory by Communicating”
Go / Java: Shared Heap
All threads share the same heap. Data is passed by mutating shared objects.
Failure Mode (Java Example)
// Java: Explicit Locking (The Bottleneck)
class Counter {
private int count = 0;
// synchronized forces the OS to pause other threads (context switch)
public synchronized void increment() {
count++;
}
}
- Race conditions: Forgetting
synchronizedleads to corrupted data. - Performance: Locks require OS intervention, costing thousands of cycles.
- Deadlocks: Circular waiting can freeze the application.
Erlang: Private Heaps
Each process has its own heap, eliminating “noisy neighbor” effects.
Why Erlang Is “Better” (Bank Example)
-module(bank_server).
-behaviour(gen_server).
%% 1. The Safe Bank Process
init([]) -> {ok, 100}. %% Balance is $100
%% 2. The Dangerous Crash Process
trigger_crash() ->
spawn(fun() ->
%% A. This allocates 1 GB on a PRIVATE heap
CrashList = lists:seq(1, 100000000),
%% B. Crashes immediately
1 / 0
end).
- Allocation: The spawned process allocates 1 GB on its private heap. In Java/Go this would fill the global heap and trigger a stop‑the‑world GC.
- Crash: The process dies (divide‑by‑zero).
- Cleanup: The Erlang VM simply discards the private heap.
- Zero GC cost: No need to scan the memory of other processes.
- Zero impact: The
bank_servercontinues handling the $100 balance with microsecond latency, unaffected by the crash.
Final Takeaway
- Java’s shared‑memory model places a heavy correctness burden on engineers, making large‑scale concurrency harder to reason about.
- Erlang excels in reliability because private heaps prevent “noisy neighbors” from affecting the whole system.
- Go offers a pragmatic middle ground: it uses a shared heap for raw speed (no data copying) while encouraging CSP‑style communication (channels) to avoid the complexity of explicit locks.