How Boredom + A New PC Led to a 64x Faster Prime Sieve in Rust (The 'Dopamine optimization' Story)

Published: (February 15, 2026 at 11:21 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

The Origin Story: It Started With a “New Toy”

About six months ago, I was sitting at my desk with a serious case of coder’s block. I didn’t want to grind through existing bugs; I needed a win. A dopamine hit.

I also happened to be staring at a new (to me) monster of a workstation: 4.3 GHz i7, 64 GB RAM, Quadro GPU. It was a beast, and I realized I had never actually pushed it. I wanted to see the metal glow. I wanted to make the fans spin.

So I set a challenge: write the absolute fastest, most CPU‑punishing code possible in C++. Math is heavy, so I landed on prime generation. I found a paper on optimizing the Sieve of Eratosthenes, dug into SIMD intrinsics (AVX2/512), and spent a frantic weekend obsessing over cache lines and cycle counts. I managed to beat the standard library implementations, watched my CPU hit 100 % usage… and then, just like that, the dopamine faded.

“Cool. Next.”

I shelved the code and forgot about it.

The Flashback

Fast forward to three days ago. I was compiling a Rust project, watching the dependencies scroll by:

Compiling rand...
Compiling primes...

It hit me like a brick. I had an entire codebase of highly optimized, battle‑tested prime‑generation logic sitting in a dusty C++ repo. I had already solved the hard parts—the bit‑twiddling, the cache locality, the memory layout.

Why wasn’t I using it in Rust?

The Frenzy: Porting to Rust

The fear of losing the idea before I could implement it kicked in. I launched into a 12‑hour coding fugue state.

The goal wasn’t just a port; it was a reimagining. In C++ I was raw‑dogging pointers. In Rust I wanted that same performance but with safety and better ergonomics.

I ended up with Primer—a crate that is:

  • Bit‑Packed: uses 1 bit per odd number (effectively 0.5 bits per number).
  • Odd‑Only: hard‑codes 2 and ignores even numbers entirely.
  • Intrinsics‑Powered: uses trailing_zeros (compiles to tzcnt on x86) to scan the sieve instantly.
  • Kernighan‑Iterated: skips zeros at the hardware level.

The Result: 64× Faster & 95× Smaller

I didn’t expect the Rust compiler to play this nice, but the results blew me away.

Benchmark (n = 50,000,000)

  • Standard Vec implementation: ~47 MB RAM
  • Primer (segmented buffer): ~32 KB RAM
  • Speed: up to 64× faster than the primes crate in bulk generation.

Why This Matters (Beyond the Speed)

This project wasn’t born from a need for primes. It was born from boredom and a desire to see a computer work.

Prime sieve visualisation

Sometimes the best code doesn’t come from a ticket or a spec sheet. It comes from staring at a blinking cursor at 2 AM, wondering, “I wonder if I can make this go faster…”, and then accidentally building the fastest thing in the room.

Check out the code (and the benchmarks) on GitHub:

👉

(Also, if you are an embedded Rust dev—ESP32/Raspberry Pi—this 32 KB footprint is specifically for you. Go wild.)

0 views
Back to Blog

Related posts

Read more »