How Boredom + A New PC Led to a 64x Faster Prime Sieve in Rust (The 'Dopamine optimization' Story)
Source: Dev.to
The Origin Story: It Started With a “New Toy”
About six months ago, I was sitting at my desk with a serious case of coder’s block. I didn’t want to grind through existing bugs; I needed a win. A dopamine hit.
I also happened to be staring at a new (to me) monster of a workstation: 4.3 GHz i7, 64 GB RAM, Quadro GPU. It was a beast, and I realized I had never actually pushed it. I wanted to see the metal glow. I wanted to make the fans spin.
So I set a challenge: write the absolute fastest, most CPU‑punishing code possible in C++. Math is heavy, so I landed on prime generation. I found a paper on optimizing the Sieve of Eratosthenes, dug into SIMD intrinsics (AVX2/512), and spent a frantic weekend obsessing over cache lines and cycle counts. I managed to beat the standard library implementations, watched my CPU hit 100 % usage… and then, just like that, the dopamine faded.
“Cool. Next.”
I shelved the code and forgot about it.
The Flashback
Fast forward to three days ago. I was compiling a Rust project, watching the dependencies scroll by:
Compiling rand...
Compiling primes...
It hit me like a brick. I had an entire codebase of highly optimized, battle‑tested prime‑generation logic sitting in a dusty C++ repo. I had already solved the hard parts—the bit‑twiddling, the cache locality, the memory layout.
Why wasn’t I using it in Rust?
The Frenzy: Porting to Rust
The fear of losing the idea before I could implement it kicked in. I launched into a 12‑hour coding fugue state.
The goal wasn’t just a port; it was a reimagining. In C++ I was raw‑dogging pointers. In Rust I wanted that same performance but with safety and better ergonomics.
I ended up with Primer—a crate that is:
- Bit‑Packed: uses 1 bit per odd number (effectively 0.5 bits per number).
- Odd‑Only: hard‑codes 2 and ignores even numbers entirely.
- Intrinsics‑Powered: uses
trailing_zeros(compiles totzcnton x86) to scan the sieve instantly. - Kernighan‑Iterated: skips zeros at the hardware level.
The Result: 64× Faster & 95× Smaller
I didn’t expect the Rust compiler to play this nice, but the results blew me away.
Benchmark (n = 50,000,000)
- Standard
Vecimplementation: ~47 MB RAM - Primer (segmented buffer): ~32 KB RAM
- Speed: up to 64× faster than the
primescrate in bulk generation.
Why This Matters (Beyond the Speed)
This project wasn’t born from a need for primes. It was born from boredom and a desire to see a computer work.

Sometimes the best code doesn’t come from a ticket or a spec sheet. It comes from staring at a blinking cursor at 2 AM, wondering, “I wonder if I can make this go faster…”, and then accidentally building the fastest thing in the room.
Check out the code (and the benchmarks) on GitHub:
👉
(Also, if you are an embedded Rust dev—ESP32/Raspberry Pi—this 32 KB footprint is specifically for you. Go wild.)