Designing for Sub-Microsecond Latency (link)
Source: Dev.to
Lessons from Building a Minimal Execution Engine
Most frameworks optimize for throughput, developer velocity, or horizontal scalability. When you care about tail latency, determinism, and sub‑microsecond critical paths, those abstractions often become liabilities. I built SubMicro Execution Engine to explore what happens when latency — not features — is the primary design constraint. Below are a few practical lessons that shaped the system.
Latency Lives in the Edges, Not the Core Logic
The actual “work” a system performs is rarely the bottleneck. Latency hides in:
- memory allocation
- cache‑line contention
- branch misprediction
- scheduler handoffs
- synchronization primitives
Key practices
- Keeping hot paths allocation‑free
- Favoring flat, cache‑friendly data layouts
- Avoiding implicit synchronization
- Designing execution flows that fit in L1/L2 cache
If you can’t draw the hot path from memory, you don’t control latency.
Determinism Beats Raw Throughput
A system that does 1 M ops/sec sometimes is less useful than one that does 200 k ops/sec always.
Design choices were guided by
- stable execution order
- predictable scheduling
- minimal dynamic behavior in hot paths
This trades peak throughput for tight latency distributions, which matter far more in real‑time and trading‑style systems.
Abstractions Have a Cost — Measure Them Ruthlessly
Abstractions aren’t bad, but unmeasured abstractions are dangerous.
In low‑latency systems:
- virtual dispatch can cost more than the logic itself
- generic containers hide memory access patterns
- “clean” interfaces often fragment the execution path
Instead, aim for
- explicit control over execution
- visible data movement
- simple, inspectable components
Code clarity is preserved by removing layers, not adding them.
Scheduling Is a Latency Feature
Schedulers decide when work happens — which is as important as what happens.
Design considerations include
- minimal context switching
- optional busy‑polling strategies
- execution models that avoid OS interference in hot paths
The goal is to keep execution close to the CPU, not bouncing between queues and threads.
Measure the Tail, Not the Average
Average latency lies.
The engine is designed with the assumption that:
- p99 and p99.9 matter more than the mean
- occasional spikes break real‑time systems
- instrumentation must be lightweight enough for production use
If you don’t measure the tail, you are optimizing blind.
Closing Thoughts
This project is intentionally minimal. It is not a framework; it is an exploration of how far you can push latency control when every design decision answers one question:
Does this reduce or increase unpredictability?
Repository: submicro-execution-engine
Demo site: https://submicro.krishnabajpai.me/