When Stability Improves Performance (Threading)

Published: (May 9, 2026 at 01:33 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

TokenGate Overview

TokenGate is a token‑managed concurrency system. Decorated functions return tokens instead of executing immediately. Tokens are admitted through a wrapped decorator, routed to per‑core mailboxes by weight class, and executed on thread‑pool workers.

The system separates async coordination from threaded execution:

  • The async event loop manages routing and coordination of tokens.
  • The thread pool handles the actual execution.

Weight Classification and Core Assignment

Each task is assigned a weight that determines the core range it may run on:

WeightCore Range
HEAVYAll cores
MEDIUMCore 2 +
LIGHTCore 3 +

Within a range, a staggered position counter distributes tokens across workers in FIFO order, allowing interleaved retry tokens.

Sticky Tokens

Tokens that share the same (operation_type, args) keys are pinned to the core that first receives them, preserving data locality. When a token arrives, sticky_registry.mark() creates a sticky anchor that automatically groups related parts.

@task_token_guard(
    operation_type="my_op",
    tags={"weight": "medium", "sticky_anchor": "my_domain"},
)
def my_operation(n: int) -> int:
    ...

Proactive Domain Anchoring

A second, proactive layer anchors an entire call chain to a domain before any child token is routed. This ensures that all related data is interpreted on the same core.

When a lead token is decorated with external_calls, a SHA‑256 seed is generated:

seed = SHA256(token_id + ":" + freeze(external_calls))

The 64‑character hex digest pins the seed to whichever core the lead lands on. All spawned child tokens inherit the seed and therefore execute on the same core.

@task_token_guard(
    operation_type="lead_op",
    tags={"weight": "medium", "external_calls": ["child_op"]},
)
def lead_operation(n: int) -> list:
    # Children inherit the seed and land on the same core
    return [child_op(n + i) for i in range(4)]

The pending lead‑operation count increments for each external call at creation time and decrements on completion. When it reaches zero, the seed is released.

Performance Results

The benchmark ran 15 doubling waves, processing 131,068 tokens in total. The table below shows tokens per second, average latency, and overlap factor for each wave.

WaveTokensTok/sLat (ms)Overlap
141386.20.7211.44×
282391.20.4182.48×
3162744.80.3644.82×
4322812.70.35611.32×
5642880.00.34722.01×
61282907.60.34429.78×
72562846.80.35137.98×
85122811.50.35641.81×
910242813.90.35544.18×
1020482644.30.37844.86× (peak)
1140962816.30.35538.34×
1281922819.90.35532.64×
13163842765.00.36227.92× (sustained)
14327682707.70.36924.96×
15655362789.50.35824.21×

Zero failures. Average latency: 0.386 ms/token.

The previous routing approach capped overlap at ~17× after saturation, limited by cross‑domain traffic. Domain anchoring eliminates that waste: tokens that belong together stay together, keeping cache lines warm and reducing cross‑core traffic. Consequently, the overlap ceiling rises, and latency remains flat even as overlap exceeds 20×.

Getting Involved

If you run concurrent Python workloads—task queues, async pipelines, or any system with related operations that currently route freely—feel free to try TokenGate and share your observations.

  • The sticky registry and hash conductor are opt‑in; existing code routes normally.
  • Hashed domain anchoring and sticky tokens are slated as the first “production‑ready” features.

Interested Cases

  • Unexpected concurrency ceilings without obvious cause.
  • High‑throughput pipelines where data locality could improve performance.

The repository is public; issues and observations are welcome:

GitHub repository


TokenGate represents nearly 4,000 hours of hobbyist development. As the project moves toward a business offering, feedback from the community is greatly appreciated.

0 views
Back to Blog

Related posts

Read more »

Bun ported to Rust in 6 days

Overview - Test coverage: 99.8 % of Bun’s pre‑existing test suite passes on Linux x64 glibc in the Rust rewrite. - The codebase is essentially the same, but Ru...