When Stability Improves Performance (Threading)
Source: Dev.to
TokenGate Overview
TokenGate is a token‑managed concurrency system. Decorated functions return tokens instead of executing immediately. Tokens are admitted through a wrapped decorator, routed to per‑core mailboxes by weight class, and executed on thread‑pool workers.
The system separates async coordination from threaded execution:
- The async event loop manages routing and coordination of tokens.
- The thread pool handles the actual execution.
Weight Classification and Core Assignment
Each task is assigned a weight that determines the core range it may run on:
| Weight | Core Range |
|---|---|
| HEAVY | All cores |
| MEDIUM | Core 2 + |
| LIGHT | Core 3 + |
Within a range, a staggered position counter distributes tokens across workers in FIFO order, allowing interleaved retry tokens.
Sticky Tokens
Tokens that share the same (operation_type, args) keys are pinned to the core that first receives them, preserving data locality. When a token arrives, sticky_registry.mark() creates a sticky anchor that automatically groups related parts.
@task_token_guard(
operation_type="my_op",
tags={"weight": "medium", "sticky_anchor": "my_domain"},
)
def my_operation(n: int) -> int:
...
Proactive Domain Anchoring
A second, proactive layer anchors an entire call chain to a domain before any child token is routed. This ensures that all related data is interpreted on the same core.
When a lead token is decorated with external_calls, a SHA‑256 seed is generated:
seed = SHA256(token_id + ":" + freeze(external_calls))
The 64‑character hex digest pins the seed to whichever core the lead lands on. All spawned child tokens inherit the seed and therefore execute on the same core.
@task_token_guard(
operation_type="lead_op",
tags={"weight": "medium", "external_calls": ["child_op"]},
)
def lead_operation(n: int) -> list:
# Children inherit the seed and land on the same core
return [child_op(n + i) for i in range(4)]
The pending lead‑operation count increments for each external call at creation time and decrements on completion. When it reaches zero, the seed is released.
Performance Results
The benchmark ran 15 doubling waves, processing 131,068 tokens in total. The table below shows tokens per second, average latency, and overlap factor for each wave.
| Wave | Tokens | Tok/s | Lat (ms) | Overlap |
|---|---|---|---|---|
| 1 | 4 | 1386.2 | 0.721 | 1.44× |
| 2 | 8 | 2391.2 | 0.418 | 2.48× |
| 3 | 16 | 2744.8 | 0.364 | 4.82× |
| 4 | 32 | 2812.7 | 0.356 | 11.32× |
| 5 | 64 | 2880.0 | 0.347 | 22.01× |
| 6 | 128 | 2907.6 | 0.344 | 29.78× |
| 7 | 256 | 2846.8 | 0.351 | 37.98× |
| 8 | 512 | 2811.5 | 0.356 | 41.81× |
| 9 | 1024 | 2813.9 | 0.355 | 44.18× |
| 10 | 2048 | 2644.3 | 0.378 | 44.86× (peak) |
| 11 | 4096 | 2816.3 | 0.355 | 38.34× |
| 12 | 8192 | 2819.9 | 0.355 | 32.64× |
| 13 | 16384 | 2765.0 | 0.362 | 27.92× (sustained) |
| 14 | 32768 | 2707.7 | 0.369 | 24.96× |
| 15 | 65536 | 2789.5 | 0.358 | 24.21× |
Zero failures. Average latency: 0.386 ms/token.
The previous routing approach capped overlap at ~17× after saturation, limited by cross‑domain traffic. Domain anchoring eliminates that waste: tokens that belong together stay together, keeping cache lines warm and reducing cross‑core traffic. Consequently, the overlap ceiling rises, and latency remains flat even as overlap exceeds 20×.
Getting Involved
If you run concurrent Python workloads—task queues, async pipelines, or any system with related operations that currently route freely—feel free to try TokenGate and share your observations.
- The sticky registry and hash conductor are opt‑in; existing code routes normally.
- Hashed domain anchoring and sticky tokens are slated as the first “production‑ready” features.
Interested Cases
- Unexpected concurrency ceilings without obvious cause.
- High‑throughput pipelines where data locality could improve performance.
The repository is public; issues and observations are welcome:
TokenGate represents nearly 4,000 hours of hobbyist development. As the project moves toward a business offering, feedback from the community is greatly appreciated.