Leveraging io_uring for performant asynchronous linux applications.

Published: 2 months ago (February 8, 2026 at 06:52 PM EST)

5 min read

Source: Dev.to

Source: Dev.to

author: Sospeter Kinyanjui

Intro

For the longest time, Linux only offered epoll, an I/O‑notification facility that lets applications make read/write system calls to the kernel.
epoll first appeared in Linux 2.5.44 (2002) and became mainstream with 2.6 (2003). It uses the readiness model via the three system calls epoll_create, epoll_ctl, and epoll_wait. The kernel notifies applications when resources are ready, allowing the apps to submit work.

Because the kernel notifies only when something is ready, the model has O(1) complexity – the cost is the same whether you watch 10 connections or 10 000. However, every notification still requires a system call, which means a costly syscall tax: a context‑switch from user mode to kernel mode for each event.

It wasn’t until 2019 that io_uring arrived, providing a Linux kernel interface for truly asynchronous I/O with far fewer system calls.

What is asynchronous execution?

The ability of an application to start a long‑running task and continue executing other work without waiting for that task to finish.

Asynchronous execution makes better use of CPU and I/O resources. While epoll is event‑driven (and thus an illusion of asynchrony), io_uring actually batches multiple I/O requests and submits them with a single system call, allowing reads and writes to proceed independently.

Definition and Implementation

io_uring exposes three system calls:

Call	Purpose
`io_uring_setup(2)`	Creates the submission queue (SQ) and completion queue (CQ) and returns a file descriptor. It configures the ring buffers (`head`, `tail`, `ring_mask`, `ring_entries`).
`io_uring_enter(2)`	Tells the kernel “I have placed SQEs in the ring; go process them.”
`io_uring_register(2)`	Pre‑registers resources (e.g., buffers, files) with the kernel to avoid per‑request look‑ups.

`io_uring_setup`

Allocates a shared memory region that holds the SQ and CQ structures.
The user space side gets write permission on the SQ (the kernel reads it).
The kernel gets write permission on the CQ (the user reads it).
The design follows a single‑producer / single‑consumer model for maximum performance.

`io_uring_enter`

The “engine starter” for the whole operation. Its prototype:

#include 

int io_uring_enter(unsigned int fd,
                   unsigned int to_submit,
                   unsigned int min_complete,
                   unsigned int flags,
                   sigset_t *sig);

Calling io_uring_enter notifies the kernel that to_submit SQEs are ready for processing.

`io_uring_register`

The “VIP pass” for your data. By pre‑registering buffers or files, the kernel can use them directly without extra look‑ups or mappings, eliminating a lot of overhead.

Ecosystem and Language Bindings

C developers can use the official liburing library, which wraps the three syscalls and provides helper functions.

Rust also has strong support for io_uring, offering memory‑safety guarantees that prevent the classic “danger zone” where both the kernel and the application might access the same buffer simultaneously. The Rust compiler ensures that a buffer cannot be touched by user code until the kernel returns it in a CQE.

Popular Rust crates include:

tokio-uring – integrates io_uring with the Tokio async runtime.
glommio – a thread‑per‑core framework built on top of io_uring.
Others: io-uring, uring-sys, etc.

There’s More

Completion‑based I/O isn’t unique to Linux:

OS	Mechanism	Characteristics
Windows	I/O Completion Ports (IOCP)	Asynchronous but still requires a system call per request, leading to higher syscall overhead than `io_uring`.
macOS	`kqueue`	Readiness‑based; you must call `kevent` to discover readiness and then issue separate syscalls for the actual I/O, incurring the same syscall tax `io_uring` was designed to eliminate.

io_uring therefore represents a true asynchronous programming model for Linux, minimizing the number of system calls and context switches required for high‑performance I/O.

“Be the best cog, but keep in mind you’re not the only one.” – a reminder that solving every problem isn’t necessary; sometimes the right tool (like io_uring) is enough to make a big difference.

The Real Problem: A Cross‑Platform, Completion‑Based Asynchronous Runtime

From my own point of view, I think the real problem lies in creating an asynchronous runtime that is cross‑platform and completion‑based. Such a technology already exists: we have compio, a Rust framework for asynchronous I/O operations.

What’s Missing?

Zero‑cost abstraction – Compio arguably does not provide a true zero‑cost abstraction.
Fixed buffers – It uses fixed buffers (a design choice of io_uring) which are immutable references.

Most of the Rust ecosystem is built on top of the std::io::Read and std::io::Write traits, which expect mutable references. Compio, on the other hand, emphasizes ownership of buffers rather than borrowing them. This aligns well with the io_uring completion‑based model, but it creates a real integration problem with the rest of the ecosystem.

“But again, like I said, we just have to be here, implementing one solution at a time. By believing in ourselves even when it seems impossible. Until next time, peace, focus, desire.”

Stay Connected

You can check out other posts on my blog.

GitHub:
LinkedIn: