Writing a tiny PID 1 for containers in pure assembly (x86-64 + ARM64)

Published: (December 6, 2025 at 01:43 PM EST)
5 min read
Source: Dev.to

Source: Dev.to

Most of us don’t think much about PID 1 when building Docker images. We just slap a CMD on the Dockerfile, run the container, and move on—until one day:

  • docker stop hangs forever
  • Ctrl+C doesn’t terminate your container
  • you discover a pile of zombie processes inside

All of these symptoms point to the same root cause: your application is running as PID 1 and doesn’t behave like an init process. In Linux, PID 1 has special semantics around signal handling and zombie reaping, and normal apps rarely implement those correctly.

Tools like Tini solve this brilliantly: a tiny process that runs as PID 1, forwards signals to your app, and reaps zombies. Docker even ships with Tini built‑in via --init.

In this post, I’ll walk through an alternative implementation: mini-init-asm, a small PID 1 designed for containers, written entirely in x86‑64 NASM and ARM64 GAS. It’s not meant to replace Tini everywhere; instead it is:

  • PGID‑first init for containers (always uses a separate session and process group)
  • Pure‑assembly implementation of the same core ideas
  • Includes a few extra tricks like restart‑on‑crash

Design goals

Before writing a single line of assembly, I set a few constraints.

Behave like a responsible PID 1

  • Forward termination signals to the whole process group
  • Reap zombies, including grandchildren if needed (sub‑reaper mode)
  • Exit with a meaningful status (child exit code or 128+signal style)

Be small and auditable

  • No libc, no runtime, no hidden magic
  • A single statically‑linked binary per architecture
  • Clear, reviewable control flow

Be container‑friendly

  • Easy to drop into FROM scratch images
  • Explicit support for graceful shutdown (grace period + SIGKILL escalation)
  • Optional restart logic, but not a full‑blown process manager

Support amd64 and arm64 from day one

  • x86‑64 NASM for the “normal” Docker host
  • ARM64 GAS for modern ARM servers and SBCs

The container PID 1 problem in one picture

When your app runs directly as PID 1, everything inside the container hangs off it:

Container PID 1 problem

If your‑app:

  • ignores SIGTERM, SIGINT, etc., docker stop won’t work properly, and k8s will eventually send SIGKILL
  • never calls wait() / waitpid(), then exited children become zombies until PID 1 cleans them up

An init like Tini or mini-init-asm inserts itself as PID 1 and makes your app “just another process” with a normal parent:

Init inserts itself as PID 1

PID 1 now:

  • forwards signals to a process group
  • reaps zombies
  • decides when to exit and with what status

High‑level architecture of mini-init-asm

mini-init-asm follows a PGID‑centric design:

  1. Block signals in PID 1.
  2. Spawn a child under a new session + process group (PGID = child PID).
  3. Create:
    • a signalfd listening to HUP, INT, QUIT, TERM, CHLD plus optional extra signals
    • a timerfd for the graceful shutdown window
    • an epoll instance watching both fds

Run an event loop on epoll_wait:

  • Soft signals (TERM/INT/HUP/QUIT): forward to the whole process group and start the grace timer.
  • SIGCHLD: reap children with waitpid(-1, WNOHANG) and track the main child.
  • Timer expiry: if the child is still alive, send SIGKILL to the process group.

On exit, mini-init-asm returns:

  • the child’s exit status (normal exit), or
  • BASE + signal_number if the child died by a signal.

The base is customizable via EP_EXIT_CODE_BASE, defaulting to 128 (POSIX shell convention).

Sequence: from docker run to graceful shutdown

Running the init looks like:

mini-init-amd64 -- ./your-app --flag

The flow from docker run to graceful shutdown is illustrated below:

Docker run → graceful shutdown

If the child ignores SIGTERM and is still alive when the timer expires, mini-init-asm escalates:

Escalation to SIGKILL

Pure‑assembly implementation: structure

The repository is organized to keep the assembly readable and reviewable:

src/amd64/   # NASM sources (SysV ABI, x86‑64)
src/arm64/   # GAS sources (AArch64)
include/syscalls_*.inc   # syscall numbers per arch
include/macros*.inc      # helpers for syscalls / logging

Example syscall wrapper (NASM)

; rax = syscall number
; rdi, rsi, rdx, r10, r8, r9 = args

%macro SYSCALL 0
    syscall
    cmp rax, 0
    jge .ok
    ; handle -errno in rax if needed...
.ok:
%endmacro

Forking and execing the child (NASM)

; 1) Fork/clone a child
mov     eax, SYS_clone
mov     rdi, SIGCHLD          ; flags
xor     rsi, rsi              ; child_stack (unused for simple clone)
xor     rdx, rdx
xor     r10, r10
xor     r8,  r8
xor     r9,  r9
syscall

cmp     rax, 0
je      .in_child
jl      .fork_error

; ----- Parent (PID 1) -----
; rax = child_pid
mov     [child_pid], rax
; continue with signalfd/epoll setup...
jmp     .parent_after_fork

.in_child:
    ; 2) Create new session and PGID
    mov     eax, SYS_setsid
    syscall

    ; Optionally setpgid(0, 0)
    xor     rdi, rdi
    xor     rsi, rsi
    mov     eax, SYS_setpgid
    syscall

    ; 3) execve() target program
    mov     eax, SYS_execve
    mov     rdi, [target_path]
    mov     rsi, [target_argv]
    mov     rdx, [target_envp]
    syscall

    ; If execve returns, it's an error → exit(127)
    mov     edi, 127
    mov     eax, SYS_exit
    syscall

On the ARM64 side the logic is analogous, using x8 for the syscall number and x0‑x5 for arguments.

The epoll + signalfd + timerfd loop

The main event loop is where most of the logic lives. In pseudo‑C:

for (;;) {
    int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
    if (n < 0 && errno == EINTR) continue;

    for (int i = 0; i < n; i++) {
        if (events[i].data.fd == signalfd_fd) {
            struct signalfd_siginfo si;
            read(signalfd_fd, &si, sizeof(si));
            int sig = si.ssi_signo;

            if (is_soft_shutdown(sig)) {
                forward_to_pgid(sig);
                if (!grace_timer_armed) {
                    arm_timerfd(grace_seconds);
                }
            } else if (sig == SIGCHLD) {
                reap_children();
            }
        } else if (events[i].data.fd == timerfd_fd) {
            /* Grace period expired – force kill */
            killpg(pgid, SIGKILL);
        }
    }
}

The actual assembly implements the same state machine using epoll_wait, read, kill, waitpid, and exit syscalls, all without any external libraries.

Back to Blog

Related posts

Read more »

Renuncio a hacer consultoría de FinOps

Hace unos meses inicié a apoyar a diferentes clientes en la implementación de estrategias de optimización de recursos e infraestructura. Fue una decisión comple...