Building a Redis Clone in Zig—Part 4
Source: Dev.to
The Journey: From Web Development to Systems Programming
I’ve learned a lot building Zedis, a Redis‑compatible in‑memory database, and I encourage you to try the same. If you’re coming from web development like me, building a database can feel intimidating—but it’s incredibly rewarding, and you’ll discover computing fundamentals you never imagined.
Benchmark Overview
I benchmarked Zedis with 1 000 000 requests using 50 concurrent clients. The flame graph below immediately revealed the problem:
| Function | % of Execution Time |
|---|---|
Parse.parse | 75 % |
Client.executeCommand | 13 % |
Command.deinit | 8 % |
For a Redis clone, parsing should be fast and command execution should dominate.
Digging Deeper with perf
The flame graph shows where time is spent, not why. To get more detail I used Linux’s perf tool:
perf stat -e cache-references,cache-misses,\
L1-dcache-loads,L1-dcache-load-misses,\
dTLB-loads,dTLB-load-misses \
./zedis benchmark --clients 50 --requests 1000000
Output
Performance counter stats for ‘zedis’:
20,718,078,128 cache-references
1,162,705,808 cache-misses # 5.61% of all cache refs
81,268,003,911 L1-dcache-loads
8,589,113,560 L1-dcache-load-misses # 10.57% of all L1-dcache accesses
520,613,776 dTLB-loads
78,977,433 dTLB-load-misses # 15.17% of all dTLB cache accesses
22.936441891 seconds time elapsed
15.761103000 seconds user
64.451493000 seconds sys
The program spends 15.76 s in user time and 64.45 s in system time over 22.93 s total elapsed time – it is I/O bound, waiting on network operations rather than doing useful work.
The Parsing Bottleneck
During the migration from Zig 0.14 → 0.15 the reader API changed. Unfamiliar with the new interface, I defaulted to readSliceShort, which reads one byte at a time:
var b_buf: [1]u8 = undefined;
const bytes_read = try reader.readSliceShort(&b_buf);
This is catastrophic for performance: each byte incurs a system call and function overhead.
Example Redis Protocol Message
*3\r\n
$3\r\nSET\r\n
$4\r\nkey1\r\n
$5\r\nvalue\r\n
*3– three bulk strings follow$3– a 3‑byte string (SET)$4– a 4‑byte string (key1)$5– a 5‑byte string (value)
Buffered Reading – The Fix
Instead of reading byte‑by‑byte, allocate a buffer once and let the reader work with it:
var reader_buffer: [1024 * 8]u8 = undefined;
var sr = self.connection.stream.reader(&reader_buffer);
const reader = sr.interface();
const line_with_crlf = reader.takeDelimiterInclusive('\n') catch |err| {
if (err == error.ReadFailed) return error.EndOfStream;
return err;
};
Now the parser processes large chunks instead of thousands of tiny reads, which is how high‑performance network servers should operate.
There is still room for further optimization (e.g., parsing directly from the buffer without line‑by‑line handling), but the current approach keeps the code readable.
Memory Allocation Overhead
Initially I used std.heap.GeneralPurposeAllocator for all allocations. By default it enables many safety checks, stack traces, and bookkeeping features, which add significant overhead. The flame graph showed a lot of time spent in mutex locks inside the allocator.
Switching to a leaner allocator solved most of the problem:
// Example: using the page allocator
const allocator = std.heap.page_allocator;
std.heap.smp_allocator is also an option for multi‑core scenarios.
Lessons Learned
- Systems programming demands curiosity about the internals of libraries and runtimes.
- Reading source code (Zig, Redis, TigerBeetle, PostgreSQL) is invaluable.
- Profiling tools (
perf, flame graphs) guide you to the real bottlenecks. - Buffered I/O and lightweight allocators are essential for high‑throughput servers.
It’s hard work, but the payoff is huge.
Benchmark Results
The command‑line tool used:
./redis-benchmark -t get,set -n 1000000 -c 50
SET
| Requests | Time (s) | Parallel Clients | Payload | Throughput (req/s) | Avg latency (ms) |
|---|---|---|---|---|---|
| 1 000 000 | 4.25 | 50 | 3 bytes | 235,294.12 | 0.115 |
| 1 000 000 | 4.60 | 50 | 3 bytes | 217,344.06 | 0.121 |
GET
| Requests | Time (s) | Parallel Clients | Payload | Throughput (req/s) | Avg latency (ms) |
|---|---|---|---|---|---|
| 1 000 000 | 4.29 | 50 | 3 bytes | 233,045.92 | 0.113 |
| 1 000 000 | 4.53 | 50 | 3 bytes | 220,799.30 | 0.119 |
Latency summary (ms) (for the first SET/GET runs)
SET:
avg 0.115 min 0.056 p50 0.119 p95 0.127 p99 0.143 max 2.223
GET:
avg 0.113 min 0.048 p50 0.119 p95 0.127 p99 0.135 max 0.487
Zedis is currently 5–8 % slower than Redis on both operations. While it doesn’t yet beat Redis in raw throughput, being within single‑digit percentage points of one of the most optimized in‑memory stores is a solid achievement for a learning project.
Closing Thoughts
I’m pretty satisfied with Zedis’ current performance. This may be the sunset of the project—for now. Stay tuned for future updates as I continue to learn and explore systems programming!
# Systems Programming and Database Internals
If you’re a web developer curious about systems programming, I can’t recommend this enough. Start small, make mistakes (you will—I made plenty!), profile them, fix them, and watch your understanding deepen. You don’t need to be a C wizard or have a CS degree—just curiosity and persistence.
Thanks for reading! Subscribe to follow my journey—I’m learning systems programming and sharing everything I discover along the way.
**Zedis Source Code**
---
**F2023 #25 – Potpourri:** Redis, CockroachDB, Snowflake, MangoDB, TabDB