Building a Redis Clone in Zig—Part 4

Published: 1 month ago (December 22, 2025 at 08:01 PM EST)

4 min read

Source: Dev.to

The Journey: From Web Development to Systems Programming

I’ve learned a lot building Zedis, a Redis‑compatible in‑memory database, and I encourage you to try the same. If you’re coming from web development like me, building a database can feel intimidating—but it’s incredibly rewarding, and you’ll discover computing fundamentals you never imagined.

Benchmark Overview

I benchmarked Zedis with 1 000 000 requests using 50 concurrent clients. The flame graph below immediately revealed the problem:

Function	% of Execution Time
`Parse.parse`	75 %
`Client.executeCommand`	13 %
`Command.deinit`	8 %

For a Redis clone, parsing should be fast and command execution should dominate.

Digging Deeper with `perf`

The flame graph shows where time is spent, not why. To get more detail I used Linux’s perf tool:

perf stat -e cache-references,cache-misses,\
L1-dcache-loads,L1-dcache-load-misses,\
dTLB-loads,dTLB-load-misses \
./zedis benchmark --clients 50 --requests 1000000

Output

Performance counter stats for ‘zedis’:
   20,718,078,128 cache-references
    1,162,705,808 cache-misses # 5.61% of all cache refs
   81,268,003,911 L1-dcache-loads
    8,589,113,560 L1-dcache-load-misses # 10.57% of all L1-dcache accesses
      520,613,776 dTLB-loads
       78,977,433 dTLB-load-misses # 15.17% of all dTLB cache accesses
     22.936441891 seconds time elapsed
     15.761103000 seconds user
     64.451493000 seconds sys

The program spends 15.76 s in user time and 64.45 s in system time over 22.93 s total elapsed time – it is I/O bound, waiting on network operations rather than doing useful work.

The Parsing Bottleneck

During the migration from Zig 0.14 → 0.15 the reader API changed. Unfamiliar with the new interface, I defaulted to readSliceShort, which reads one byte at a time:

var b_buf: [1]u8 = undefined;
const bytes_read = try reader.readSliceShort(&b_buf);

This is catastrophic for performance: each byte incurs a system call and function overhead.

Example Redis Protocol Message

*3\r\n
$3\r\nSET\r\n
$4\r\nkey1\r\n
$5\r\nvalue\r\n

*3 – three bulk strings follow
$3 – a 3‑byte string (SET)
$4 – a 4‑byte string (key1)
$5 – a 5‑byte string (value)

Buffered Reading – The Fix

Instead of reading byte‑by‑byte, allocate a buffer once and let the reader work with it:

var reader_buffer: [1024 * 8]u8 = undefined;
var sr = self.connection.stream.reader(&reader_buffer);
const reader = sr.interface();

const line_with_crlf = reader.takeDelimiterInclusive('\n') catch |err| {
    if (err == error.ReadFailed) return error.EndOfStream;
    return err;
};

Now the parser processes large chunks instead of thousands of tiny reads, which is how high‑performance network servers should operate.

There is still room for further optimization (e.g., parsing directly from the buffer without line‑by‑line handling), but the current approach keeps the code readable.

Memory Allocation Overhead

Initially I used std.heap.GeneralPurposeAllocator for all allocations. By default it enables many safety checks, stack traces, and bookkeeping features, which add significant overhead. The flame graph showed a lot of time spent in mutex locks inside the allocator.

Switching to a leaner allocator solved most of the problem:

// Example: using the page allocator
const allocator = std.heap.page_allocator;

std.heap.smp_allocator is also an option for multi‑core scenarios.

Lessons Learned

Systems programming demands curiosity about the internals of libraries and runtimes.
Reading source code (Zig, Redis, TigerBeetle, PostgreSQL) is invaluable.
Profiling tools (perf, flame graphs) guide you to the real bottlenecks.
Buffered I/O and lightweight allocators are essential for high‑throughput servers.

It’s hard work, but the payoff is huge.

Benchmark Results

The command‑line tool used:

./redis-benchmark -t get,set -n 1000000 -c 50

SET

Requests	Time (s)	Parallel Clients	Payload	Throughput (req/s)	Avg latency (ms)
1 000 000	4.25	50	3 bytes	235,294.12	0.115
1 000 000	4.60	50	3 bytes	217,344.06	0.121

GET

Requests	Time (s)	Parallel Clients	Payload	Throughput (req/s)	Avg latency (ms)
1 000 000	4.29	50	3 bytes	233,045.92	0.113
1 000 000	4.53	50	3 bytes	220,799.30	0.119

Latency summary (ms) (for the first SET/GET runs)

SET:
  avg 0.115  min 0.056  p50 0.119  p95 0.127  p99 0.143  max 2.223

GET:
  avg 0.113  min 0.048  p50 0.119  p95 0.127  p99 0.135  max 0.487

Zedis is currently 5–8 % slower than Redis on both operations. While it doesn’t yet beat Redis in raw throughput, being within single‑digit percentage points of one of the most optimized in‑memory stores is a solid achievement for a learning project.

Closing Thoughts

I’m pretty satisfied with Zedis’ current performance. This may be the sunset of the project—for now. Stay tuned for future updates as I continue to learn and explore systems programming!

# Systems Programming and Database Internals

If you’re a web developer curious about systems programming, I can’t recommend this enough. Start small, make mistakes (you will—I made plenty!), profile them, fix them, and watch your understanding deepen. You don’t need to be a C wizard or have a CS degree—just curiosity and persistence.

Thanks for reading! Subscribe to follow my journey—I’m learning systems programming and sharing everything I discover along the way.

**Zedis Source Code**

---

**F2023 #25 – Potpourri:** Redis, CockroachDB, Snowflake, MangoDB, TabDB

Building a Redis Clone in Zig—Part 4

The Journey: From Web Development to Systems Programming

Benchmark Overview

Digging Deeper with `perf`

The Parsing Bottleneck

Example Redis Protocol Message

Buffered Reading – The Fix

Memory Allocation Overhead

Lessons Learned

Benchmark Results

SET

GET

Closing Thoughts

Related posts

Static Allocation with Zig

I wrote a Vibe Check for your code (Runs on a Potato 🥔)

When I Took Numba to the Dojo: A Battle Royale Against Rust and CUDA

Zylix — a Zig‑based UI framework for 7 platforms

The Journey: From Web Development to Systems Programming

Benchmark Overview

Digging Deeper with perf

The Parsing Bottleneck

Example Redis Protocol Message

Buffered Reading – The Fix

Memory Allocation Overhead

Lessons Learned

Benchmark Results

SET

GET

Closing Thoughts

Related posts

Static Allocation with Zig

I wrote a Vibe Check for your code (Runs on a Potato 🥔)

When I Took Numba to the Dojo: A Battle Royale Against Rust and CUDA

Zylix — a Zig‑based UI framework for 7 platforms

Digging Deeper with `perf`