1m Tokens (& WebSocket)

Published: (March 19, 2026 at 05:32 PM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Cover image for 1m Tokens (& WebSocket)

Greetings readers, I made a threading engine with many optimizations (including ML) and WebSocket task controls per operation.

Even when computing a slow moving series like Leibniz PI series at 1 million token executions the tasks all resolved as expected in ~200 seconds.

# ── LAYER 0: TERM TOKENS ──────────────────────────────────────────────────────
@task_token_guard(operation_type='pi_term', tags={'weight': 'light'})
def compute_pi_term(n: int) -> str:
    """
    Compute a single Leibniz term: (-1)^n / (2n + 1)
    Returns as string to preserve Decimal precision across token boundary.
    Light weight — 1,000,000 of these fire simultaneously.
    """
    getcontext().prec = DECIMAL_PRECISION
    sign = Decimal(-1) ** n
    term = sign / Decimal(2 * n + 1)
    return str(term)

# ── LAYER 1: CHUNK TOKENS ─────────────────────────────────────────────────────
@task_token_guard(operation_type='pi_chunk', tags={'weight': 'light'})
def sum_chunk(term_strings: List[str]) -> str:
    """
    Sum a batch of Leibniz terms.
    Receives resolved term strings from Layer 0 tokens.
    Light weight — 1,000 of these, each summing 1,000 terms.
    """
    getcontext().prec = DECIMAL_PRECISION
    total = sum(Decimal(t) for t in term_strings)
    return str(total)

# ── LAYER 2: PARTIAL TOKENS ───────────────────────────────────────────────────
@task_token_guard(operation_type='pi_partial', tags={'weight': 'medium'})
def sum_partial(chunk_strings: List[str]) -> str:
    """
    Sum a batch of chunk sums.
    Receives resolved chunk strings from Layer 1 tokens.
    Medium weight — 10 of these, each summing 100 chunks.
    """
    getcontext().prec = DECIMAL_PRECISION
    total = sum(Decimal(c) for c in chunk_strings)
    return str(total)

Leibniz is intentionally the slowest converging PI series; it needs ~10 million terms for 7 correct digits. That makes it a good stress test: maximum token volume, minimum mathematical payoff.

Performance chart 1

(Note: 64 workers with SMT enabled is only ~7 % faster on a 7800X3D — more workers doesn’t always mean more throughput, especially for micro‑ops where execution port contention becomes the real ceiling.)

Performance chart 2

Tokens move through async admission and resolve on pinned workers; CPU‑heavy tasks stay on core 1, light tasks distribute across the rest. Failure nets, duplication safety, and WebSocket controls prevent runaway processes at the process level.

Take a look at the repo:

TokenGate

Welcome to the TokenGate repository.

What it is

A small experimental system for routing decorated synchronous functions through a token‑managed concurrency model. It is intended to operate as its own concurrency workflow rather than alongside normal threading patterns.

What it is not

It is not presented as production code.

Overview

TokenGate is an exploration of token‑managed concurrency: a concept for coordinating async orchestration with thread‑backed work in a structured way. This repository is a proof of concept, not a finished product. It is experimental, still evolving, and shared in the spirit of exploration.

If you’d like the fuller overview, please start here:

If anything here is useful, interesting, or sparks an idea, that already makes this project worthwhile.

How to Use (Two Versions, Two Decorators)

Note

Do not attempt to decorate an async function. The token decorator uses asyncio, but the decorated function itself should be synchronous.

0 views
Back to Blog

Related posts

Read more »

Rob Pike's 5 Rules of Programming

Rule 1 You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second‑guess and put in a speed hack...