The 185-Microsecond Type Hint

Published: 22 hours ago (March 2, 2026 at 05:04 PM EST)

6 min read

Source: Hacker News

How a “trivial” change yielded a 13× throughput increase

We recently released an open‑source Clojure implementation of Roughtime, a protocol for secure time synchronization with cryptographic proof.

What Roughtime does

Client request – The client sends a random nonce to the server.
Server response – The server replies with a signed certificate that includes:
- the original nonce
- a timestamp proving the response occurred after the request.
Chaining – Responses can be chained together, providing provable ordering.
Reliability check – If any server’s timestamps conflict with the established ordering, that server is cryptographically “outed” as unreliable.

The rest of the post goes on to describe the specific optimisation that turned a modest implementation into a 13× faster one.

The Heavy Lifting

A single request to our server triggers a surprising amount of work:

1. Queueing

Validation – The incoming request is validated.
Received queue – After validation it enters a received queue.
Batcher – The batcher pulls items from the received queue and creates batches that are dispatched to one of four worker queues.
Worker processing – Each worker queue:
- Decodes every request in the batch.
- Groups requests into sub‑batches by protocol version.
- Generates a response for each sub‑batch.
Sender queue – Responses are placed in a sender queue, which un‑batches them and sends the replies back to the requesting server.

2. Protocol Compatibility

We support the entire evolution of the protocol—from Google’s original specification through all fifteen IETF drafts, i.e., sixteen versions. Consequently, the codebase contains version‑specific logic scattered throughout:

Aspect	Variation by Version
Version tags	Different identifiers
Padding schemes	Multiple algorithms
Tag labels	Varying formats
Hash sizes	128‑, 256‑, 512‑bit options
Packet layouts	Distinct field orders

In many places, maintaining compatibility took precedence over elegance or raw performance.

3. Recursive Merkle Trees

Each batch is folded into a Merkle tree built with SHA‑512. This requires recursive hashing from the leaves up to the root, making it a purely CPU‑bound operation.

4. Ed25519 Signatures

Finally, every response is signed with Ed25519. Public‑key signatures are computationally expensive and typically dominate the overall latency of the system.

The “Sluggish” Server

Given all that complexity, along with the fact that I’m using a high‑level dynamic programming language, I wasn’t surprised when my initial benchmarks showed the server responding in 200 µs.

I ran a profiler expecting to see SHA‑512 or Ed25519 dominating. Instead, nearly 90 % of the runtime was attributed to the most mundane line in the entire library:

(defn encode-rt-message [msg-map]
  (let [sorted-entries (sort-tags msg-map)
        tag-bytes      (mapv #(tag/tag->bytes (key %)) sorted-entries)
        val-bytes      (mapv #(tag/pad4 (val %)) sorted-entries)

        ;; THE BOTTLENECK:
        val-lens       (mapv alength val-bytes)

        ...])

This line is arguably the most trivial part of the codebase. It iterates over roughly five byte arrays and asks, “How long are you?”—and that’s it.

Yet this single line accounted for almost the entire request time.

The Fix

I wrapped alength in an anonymous function so I could add a type hint:

;; BEFORE (~31 µs)
(mapv alength val-bytes)

;; AFTER (~4 µs)
(mapv (fn [^bytes v] (alength v)) val-bytes)

Profiling the code with and without the type hint shows that the encoding time dropped from 31 µs to 4 µs.

Why Was `(mapv alength …)` So Slow?

Clojure emitted no reflection warnings when I ran my tests; the code is perfectly legal.
But mapv is a higher‑order function. It receives alength as a generic IFn object and calls invoke() on it for every item. This means that:

The compiler cannot inline the operation because the function is passed as a value.
alength itself must perform a runtime check (RT.alength) to ensure the argument is an array.
Finally, it calls java.lang.reflect.Array.getLength.

The overhead of dynamic dispatch, runtime type checking, and reflection adds up in a tight loop!

What Changed?

(fn [^bytes v] (alength v))

When the function is written this way, the compiler has enough static information to emit a single arraylength bytecode instruction. We replaced a complex chain of method calls with one CPU instruction.

End‑to‑End Benchmark

To verify the impact, I ran a full end‑to‑end benchmark using Criterium’s quick-benchmark.

Test conditions

CPU: Apple M2
Workers: 4 parallel workers
Merkle batch size: 64
Crypto: Full (SHA‑512 + Ed25519)

Results

	Responses / sec	µs / response / core
Without type hint	19 959	200.4
With type hint	264 316	15.1

That’s a 13 × throughput increase from a single type hint.

Visual comparison

Server throughput before and after the fix. The x‑axis shows the batch size on a logarithmic scale; the y‑axis shows the response rate.

Why Did the Speedup Get Larger?

In isolated tests the improvement was ~8×.
According to Amdahl’s law, we would expect a substantially lower improvement in the real system.
Surprisingly, the speedup grew to ~13× under load.

Working hypothesis

When multiple workers invoke the same reflective, non‑inlinable call site, the JVM cannot optimize it effectively. Removing that reflective barrier lets the JIT inline the call, allowing the code to parallelize cleanly.

Result

Better scaling under load
Higher overall speedup than predicted by Amdahl’s law

Further investigation is needed to confirm the hypothesis, but the observed behavior aligns with reduced contention in the reflective call path.

The Lesson

I learned that, when optimizing Clojure code, “no reflection warnings” is not always the end of the story. When you pass low‑level primitives through higher‑order interfaces, you may accidentally force the runtime back onto generic (and slower) paths. The compiler needs enough information to emit primitive bytecode.

In this case, the code I thought was complex—the crypto, Merkle trees, and protocol gymnastics—was fine. It was the “trivial” line that killed performance.

Without a profiler, I would never, ever have suspected it.

The 185-Microsecond Type Hint

How a “trivial” change yielded a 13× throughput increase

What Roughtime does

The Heavy Lifting

1. Queueing

2. Protocol Compatibility

3. Recursive Merkle Trees

4. Ed25519 Signatures

The “Sluggish” Server

The Fix

Why Was `(mapv alength …)` So Slow?

What Changed?

End‑to‑End Benchmark

Test conditions

Results

Visual comparison

Why Did the Speedup Get Larger?

Working hypothesis

Result

The Lesson

Related posts

Why Your PDF Pipeline Is Slower Than It Needs To Be

I built 6 JavaScript widgets with zero dependencies — here's what I learned from each

How I Built a Production-Ready Reviews System (MongoDB Aggregation, Custom Toasts & CSS Variables)

Line of Defense: Three Systems, Not One

How a “trivial” change yielded a 13× throughput increase

What Roughtime does

The Heavy Lifting

1. Queueing

2. Protocol Compatibility

3. Recursive Merkle Trees

4. Ed25519 Signatures

The “Sluggish” Server

The Fix

Why Was (mapv alength …) So Slow?

What Changed?

End‑to‑End Benchmark

Test conditions

Results

Visual comparison

Why Did the Speedup Get Larger?

Working hypothesis

Result

The Lesson

Related posts

Why Your PDF Pipeline Is Slower Than It Needs To Be

I built 6 JavaScript widgets with zero dependencies — here's what I learned from each

How I Built a Production-Ready Reviews System (MongoDB Aggregation, Custom Toasts & CSS Variables)

Line of Defense: Three Systems, Not One

Why Was `(mapv alength …)` So Slow?