Leveraging io_uring for performant asynchronous linux applications.

Published: (February 8, 2026 at 06:52 PM EST)
5 min read
Source: Dev.to

Source: Dev.to

how the ring buffers for sequence queue entries and completion queue entries are laid in the shared memory in context to io_uring linux i/o system call

author: Sospeter Kinyanjui

Intro

For the longest time, Linux only offered epoll, an I/O‑notification facility that lets applications make read/write system calls to the kernel.
epoll first appeared in Linux 2.5.44 (2002) and became mainstream with 2.6 (2003). It uses the readiness model via the three system calls epoll_create, epoll_ctl, and epoll_wait. The kernel notifies applications when resources are ready, allowing the apps to submit work.

Because the kernel notifies only when something is ready, the model has O(1) complexity – the cost is the same whether you watch 10 connections or 10 000. However, every notification still requires a system call, which means a costly syscall tax: a context‑switch from user mode to kernel mode for each event.

It wasn’t until 2019 that io_uring arrived, providing a Linux kernel interface for truly asynchronous I/O with far fewer system calls.

What is asynchronous execution?

The ability of an application to start a long‑running task and continue executing other work without waiting for that task to finish.

Asynchronous execution makes better use of CPU and I/O resources. While epoll is event‑driven (and thus an illusion of asynchrony), io_uring actually batches multiple I/O requests and submits them with a single system call, allowing reads and writes to proceed independently.

Definition and Implementation

io_uring exposes three system calls:

CallPurpose
io_uring_setup(2)Creates the submission queue (SQ) and completion queue (CQ) and returns a file descriptor. It configures the ring buffers (head, tail, ring_mask, ring_entries).
io_uring_enter(2)Tells the kernel “I have placed SQEs in the ring; go process them.”
io_uring_register(2)Pre‑registers resources (e.g., buffers, files) with the kernel to avoid per‑request look‑ups.

io_uring_setup

  • Allocates a shared memory region that holds the SQ and CQ structures.
  • The user space side gets write permission on the SQ (the kernel reads it).
  • The kernel gets write permission on the CQ (the user reads it).
  • The design follows a single‑producer / single‑consumer model for maximum performance.

io_uring_enter

The “engine starter” for the whole operation. Its prototype:

#include 

int io_uring_enter(unsigned int fd,
                   unsigned int to_submit,
                   unsigned int min_complete,
                   unsigned int flags,
                   sigset_t *sig);

Calling io_uring_enter notifies the kernel that to_submit SQEs are ready for processing.

io_uring_register

The “VIP pass” for your data. By pre‑registering buffers or files, the kernel can use them directly without extra look‑ups or mappings, eliminating a lot of overhead.

Ecosystem and Language Bindings

C developers can use the official liburing library, which wraps the three syscalls and provides helper functions.

Rust also has strong support for io_uring, offering memory‑safety guarantees that prevent the classic “danger zone” where both the kernel and the application might access the same buffer simultaneously. The Rust compiler ensures that a buffer cannot be touched by user code until the kernel returns it in a CQE.

Popular Rust crates include:

  • tokio-uring – integrates io_uring with the Tokio async runtime.
  • glommio – a thread‑per‑core framework built on top of io_uring.
  • Others: io-uring, uring-sys, etc.

There’s More

Completion‑based I/O isn’t unique to Linux:

OSMechanismCharacteristics
WindowsI/O Completion Ports (IOCP)Asynchronous but still requires a system call per request, leading to higher syscall overhead than io_uring.
macOSkqueueReadiness‑based; you must call kevent to discover readiness and then issue separate syscalls for the actual I/O, incurring the same syscall tax io_uring was designed to eliminate.

io_uring therefore represents a true asynchronous programming model for Linux, minimizing the number of system calls and context switches required for high‑performance I/O.

“Be the best cog, but keep in mind you’re not the only one.” – a reminder that solving every problem isn’t necessary; sometimes the right tool (like io_uring) is enough to make a big difference.

The Real Problem: A Cross‑Platform, Completion‑Based Asynchronous Runtime

From my own point of view, I think the real problem lies in creating an asynchronous runtime that is cross‑platform and completion‑based. Such a technology already exists: we have compio, a Rust framework for asynchronous I/O operations.

What’s Missing?

  1. Zero‑cost abstraction – Compio arguably does not provide a true zero‑cost abstraction.
  2. Fixed buffers – It uses fixed buffers (a design choice of io_uring) which are immutable references.

Most of the Rust ecosystem is built on top of the std::io::Read and std::io::Write traits, which expect mutable references. Compio, on the other hand, emphasizes ownership of buffers rather than borrowing them. This aligns well with the io_uring completion‑based model, but it creates a real integration problem with the rest of the ecosystem.

“But again, like I said, we just have to be here, implementing one solution at a time. By believing in ourselves even when it seems impossible. Until next time, peace, focus, desire.”

Stay Connected

You can check out other posts on my blog.

  • GitHub:
  • LinkedIn:
0 views
Back to Blog

Related posts

Read more »