Parallel Programming in Zig: Threads, Shared Memory, and Synchronization

Published: 3 days ago (February 26, 2026 at 04:36 PM EST)

6 min read

Source: Dev.to

[![Seven](https://media2.dev.to/dynamic/image/width=50,height=50,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3586142%2F7c318560-971f-430e-b1f5-d5b4312f1946.png)](https://dev.to/11110110011)

# Introduction

This post continues my exploration of low‑level programming and the Zig language. Today, we will explore the fundamental concepts of parallel programming. We'll start by defining what threads are, then move on to spawning them in Zig, and finally, we'll see how to handle shared memory safely using synchronization primitives.

---

## Step 1: The Basic Process

Every program runs as a process with at least one thread. Let's start by creating a simple task and running it in our `main` function. At this stage, everything is sequential.

```zig
const std = @import("std");

pub fn main() !void {
    std.debug.print("Starting main thread...\n", .{});
    task(1);
    std.debug.print("Finished main thread.\n", .{});
}

fn task(id: usize) void {
    std.debug.print("task {} is running\n", .{id});
    var ts = std.posix.timespec{ .sec = 1, .nsec = 0 };
    _ = std.posix.system.nanosleep(&ts, &ts);
}

Output

Starting main thread...
task 1 is running
Finished main thread.

Step 2: Spawning Your First Thread

Now, let’s use std.Thread.spawn to run the task on a separate path of execution. We use thread.join() to tell the main thread to wait for the worker to finish.

Note: In Zig 0.16, main can receive a std.process.Init argument, which gives us access to std.Io — used here to measure elapsed time with std.Io.Clock.

const std = @import("std");

pub fn main(init: std.process.Init) !void {
    const start = std.Io.Clock.now(.real, init.io);
    const thread = try std.Thread.spawn(.{}, task, .{1});
    thread.join(); // This blocks the main thread until the task is done
    const end = std.Io.Clock.now(.real, init.io);
    const duration = start.durationTo(end);
    std.debug.print("Time: {}ms\n", .{duration.toMilliseconds()});
}

fn task(id: usize) void {
    std.debug.print("Task {} is running thread: {} \n", .{ id, std.Thread.getCurrentId() });
    var ts = std.posix.timespec{ .sec = 1, .nsec = 0 };
    _ = std.posix.system.nanosleep(&ts, &ts);
}

Output

Task 1 is running thread: 1134137
Time: 1000ms

Step 3: Running in Parallel

To use your CPU cores effectively, we can spawn multiple threads. By storing them in an array and joining them after spawning all of them, they all work at the same time.

Notice that 4 threads each sleeping for 1 second still complete in ~1000 ms total — they truly run in parallel.

const std = @import("std");

pub fn main(init: std.process.Init) !void {
    const start = std.Io.Clock.now(.real, init.io);
    var threads: [4]std.Thread = undefined;
    for (&threads, 0..) |*t, i| {
        t.* = try std.Thread.spawn(.{}, task, .{i});
    }
    for (threads) |t| t.join();
    const end = std.Io.Clock.now(.real, init.io);
    const duration = start.durationTo(end);
    std.debug.print("Time: {}ms\n", .{duration.toMilliseconds()});
}

fn task(id: usize) void {
    std.debug.print("Task {} is running thread: {} \n", .{ id, std.Thread.getCurrentId() });
    var ts = std.posix.timespec{ .sec = 1, .nsec = 0 };
    _ = std.posix.system.nanosleep(&ts, &ts);
}

Output

Task 0 is running thread: 1134350
Task 1 is running thread: 1134351
Task 2 is running thread: 1134352
Task 3 is running thread: 1134353
Time: 1000ms

Step 4: The Shared Memory Problem (Race Condition)

Threads share the same memory space. If multiple threads try to update the same variable at once, they will overwrite each other’s changes, causing a race condition — the final result will be inconsistent and unpredictable.

const std = @import("std");

// This will produce inconsistent results!
pub fn main() !void {
    var arr = [_]i32{ 0, 0, 0 };
    var threads: [5]std.Thread = undefined;
    for (&threads) |*t| {
        t.* = try std.Thread.spawn(.{}, task, .{&arr});
    }
    for (threads) |t| t.join();
    std.debug.print("Result: {any}\n", .{arr});
}

fn task(arr: *[3]i32) void {
    for (0..100_000) |_| {
        for (0..3) |j| arr[j] += 1;
    }
}

Sample Output

Result: { 311264, 289236, 273695 }

The expected result would be { 500000, 500000, 500000 } (5 threads × 100 000 iterations), but we get something different — and it changes on every run. That is the race condition in action.

Step 5: Fixing It with a Mutex and `defer`

To fix the bug, we use a Mutex to lock the critical section — the block of code that accesses shared data. Only one thread can hold the lock at a time, so the others wait their turn.

In Zig 0.16, the mutex is std.Io.Mutex and its lock/unlock methods require passing the std.Io handle. We also pass io down to the task function for this reason.

Tip: Lock and unlock inside the loop, wrapping only the minimal critical section. Locking outside the loop would force threads to run one at a time for their entire duration, eliminating any parallelism benefit.

We also use defer mutex.unlock(io) immediately after the lock. This ensures the lock is always released when the block exits, even if an error occurs.

const std = @import("std");

pub fn main(init: std.process.Init) !void {
    var arr = [_]i32{ 0, 0, 0 };
    var mutex = std.Io.Mutex{};
    var threads: [5]std.Thread = undefined;

    for (&threads) |*t| {
        t.* = try std.Thread.spawn(.{}, task, .{ &arr, &mutex, init.io });
    }
    for (threads) |t| t.join();

    std.debug.print("Result: {any}\n", .{arr});
}

fn task(arr: *[3]i32, mutex: *std.Io.Mutex, io: std.Io) void {
    for (0..100_000) |_| {
        // Lock only the increment operation
        mutex.lock(io);
        defer mutex.unlock(io);
        for (0..3) |j| arr[j] += 1;
    }
}

Expected Output (consistent)

Result: { 500000, 500000, 500000 }

The mutex guarantees that each increment is performed atomically, eliminating the race condition while still allowing the threads to run in parallel for the rest of the work.

That concludes the walkthrough of basic parallel programming in Zig: creating threads, measuring execution time, exposing race conditions, and safely synchronizing shared state with a mutex. Happy coding!

Code Example

var mutex = std.Io.Mutex.init;
var threads: [5]std.Thread = undefined;
for (&threads) |*t| {
    t.* = try std.Thread.spawn(.{}, task, .{ &arr, &mutex, init.io });
}
for (threads) |t| t.join();
std.debug.print("Result: {any}\n", .{arr});
}

fn task(arr: *[3]i32, mutex: *std.Io.Mutex, io: std.Io) !void {
    for (0..100000) |_| {
        {
            try mutex.lock(io);
            defer mutex.unlock(io); // released at the end of this block
            for (0..3) |j| arr[j] += 1;
        }
    }
}

Output

Result: { 500000, 500000, 500000 }

Now the result is consistent and correct on every run.

Tip: For simple numeric operations on a single variable, Zig also provides std.atomic.Value, which can be more efficient than a mutex since it avoids the overhead of locking entirely.

Conclusion

Parallel programming is a powerful tool for building high‑performance software, but it requires a solid understanding of how threads interact. We’ve seen how easy it is to spawn threads in Zig, but also how quickly shared memory can lead to subtle bugs. By using tools like std.Io.Mutex — and understanding where to apply them — we can protect our data and ensure our programs remain correct and reliable as they scale across multiple CPU cores.

References

Sources & Further Reading

Contact

Feel free to reach out or follow my work:

Written with Zig 0.16.0‑dev.2565+684032671 — All code tested and verified.

Parallel Programming in Zig: Threads, Shared Memory, and Synchronization

Step 2: Spawning Your First Thread

Step 3: Running in Parallel

Step 4: The Shared Memory Problem (Race Condition)

Step 5: Fixing It with a Mutex and `defer`

Code Example

Output

Conclusion

References

Sources & Further Reading

Contact

Related posts

Attyx: tiny and fast GPU accelerated terminal emulator

Testing coding agent on an old ZX Spectrum machine code and Rust

Async/Await in C# — A Deep Dive Into How Asynchronous Programming Really Works

National Vaccine Appointment & Administration System

Step 2: Spawning Your First Thread

Step 3: Running in Parallel

Step 4: The Shared Memory Problem (Race Condition)

Step 5: Fixing It with a Mutex and defer

Code Example

Output

Conclusion

References

Sources & Further Reading

Contact

Related posts

Attyx: tiny and fast GPU accelerated terminal emulator

Testing coding agent on an old ZX Spectrum machine code and Rust

Async/Await in C# — A Deep Dive Into How Asynchronous Programming Really Works

National Vaccine Appointment & Administration System

Step 5: Fixing It with a Mutex and `defer`