When Containers Kill Nodes: Understanding Zombie Processes and PID 1
Source: Dev.to
The Hook
Early in my career, I witnessed something that changed how I think about containers forever. We were running MySQL on Kubernetes with Rocky Linux nodes. Everything seemed fine until nodes started dying one by one. The culprit? Zombie processes—hundreds of them, silently accumulating until the node couldn’t take it anymore.
This incident taught me a fundamental truth: containers are not lightweight VMs. They’re just processes.
When a process finishes execution in Linux, it doesn’t just disappear. It enters a zombie state: the process has completed, but its entry still exists in the process table. The parent must read the child’s exit status using wait(). Until the parent calls wait(), the child remains a zombie.
Parent Process
Parent Process
|
|--- fork() ---> Child Process
| |
| | (does work)
| |
| v
| Exits (becomes zombie)
| |
|<--- wait() ---------+
|
v
Zombie cleaned up
In a normal Linux system this isn’t a big problem. If a parent dies without calling wait(), the orphaned children are adopted by the init process (PID 1), which periodically reaps these zombies.
Why Containers Break This Model
When you run a container without an init process, your application becomes PID 1. There is no traditional init process, so your app is now responsible for reaping zombie processes.
FROM mysql:8.0
# MySQL process becomes PID 1
# It was never designed to be an init system
Most applications—including MySQL—are not designed to be init processes. They don’t call wait() on orphaned children, so when child processes die they become zombies with no one to clean them up. On the node, ps aux | grep Z showed hundreds of zombie MySQL helper processes, each dead but still holding an entry in the process table.
Each zombie holds:
- An entry in the process table
- A PID (and PIDs are finite)
Eventually you run out of PIDs or fill the process table, preventing new processes from spawning. The node becomes unstable and services crash.
The Fix: Tini
The solution is surprisingly simple: use a proper init process designed for containers. Tini is a minimal init system built specifically for containers. It:
- Runs as PID 1
- Spawns your application as a child process
- Forwards signals properly
- Reaps zombie processes by calling
wait()
Implementation
Option 1: Install in Dockerfile
FROM mysql:8.0
# Install tini
RUN apt-get update && apt-get install -y tini
# Set tini as entrypoint
ENTRYPOINT ["/usr/bin/tini", "--"]
# Your actual command
CMD ["mysqld"]
Option 2: Use Docker’s built‑in init
docker run --init mysql:8.0
Option 3: Kubernetes
apiVersion: v1
kind: Pod
spec:
containers:
- name: mysql
image: mysql:8.0
# For Kubernetes, bake tini into the image or use a base image that includes it
In Kubernetes, the safest pattern is to bake an init like Tini into the image, because relying on runtime flags is not portable across environments.
The Bigger Lesson
This incident challenged my mental model of containers. I used to think of them as “lightweight VMs”—isolated boxes running their own little world. The reality is different: a container is just a process with fancy isolation (namespaces, cgroups). It shares the kernel with the host. When that process misbehaves by spawning zombies, consuming memory, or exhausting PIDs, the host suffers.
Understanding this changes how you:
- Debug container issues
- Design container images
- Think about resource limits and isolation
Quick Reference
| Scenario | What Happens | Fix |
|---|---|---|
| App as PID 1, spawns children | Zombies accumulate | Use Tini or --init |
| App crashes without signal handling | Orphaned children become zombies | Proper init + signal forwarding |
| Too many zombies | PID exhaustion, node instability | Prevention via init system |
- Zombies are normal; they become a problem only when not reaped.
- Containers don’t have a traditional init by default.
- Your app shouldn’t be PID 1 unless it’s designed for it.
- Tini (or
dumb-init) are simple fixes that should be standard practice. - Containers are processes, not VMs—never forget this.