Designing for Sub-Microsecond Latency (link)

Published: (December 30, 2025 at 11:03 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

Lessons from Building a Minimal Execution Engine

Most frameworks optimize for throughput, developer velocity, or horizontal scalability. When you care about tail latency, determinism, and sub‑microsecond critical paths, those abstractions often become liabilities. I built SubMicro Execution Engine to explore what happens when latency — not features — is the primary design constraint. Below are a few practical lessons that shaped the system.

Latency Lives in the Edges, Not the Core Logic

The actual “work” a system performs is rarely the bottleneck. Latency hides in:

  • memory allocation
  • cache‑line contention
  • branch misprediction
  • scheduler handoffs
  • synchronization primitives

Key practices

  • Keeping hot paths allocation‑free
  • Favoring flat, cache‑friendly data layouts
  • Avoiding implicit synchronization
  • Designing execution flows that fit in L1/L2 cache

If you can’t draw the hot path from memory, you don’t control latency.

Determinism Beats Raw Throughput

A system that does 1 M ops/sec sometimes is less useful than one that does 200 k ops/sec always.

Design choices were guided by

  • stable execution order
  • predictable scheduling
  • minimal dynamic behavior in hot paths

This trades peak throughput for tight latency distributions, which matter far more in real‑time and trading‑style systems.

Abstractions Have a Cost — Measure Them Ruthlessly

Abstractions aren’t bad, but unmeasured abstractions are dangerous.

In low‑latency systems:

  • virtual dispatch can cost more than the logic itself
  • generic containers hide memory access patterns
  • “clean” interfaces often fragment the execution path

Instead, aim for

  • explicit control over execution
  • visible data movement
  • simple, inspectable components

Code clarity is preserved by removing layers, not adding them.

Scheduling Is a Latency Feature

Schedulers decide when work happens — which is as important as what happens.

Design considerations include

  • minimal context switching
  • optional busy‑polling strategies
  • execution models that avoid OS interference in hot paths

The goal is to keep execution close to the CPU, not bouncing between queues and threads.

Measure the Tail, Not the Average

Average latency lies.

The engine is designed with the assumption that:

  • p99 and p99.9 matter more than the mean
  • occasional spikes break real‑time systems
  • instrumentation must be lightweight enough for production use

If you don’t measure the tail, you are optimizing blind.

Closing Thoughts

This project is intentionally minimal. It is not a framework; it is an exploration of how far you can push latency control when every design decision answers one question:

Does this reduce or increase unpredictability?

Repository: submicro-execution-engine
Demo site: https://submicro.krishnabajpai.me/

Back to Blog

Related posts

Read more »