Linux Kernel Basics: User Space vs. Kernel Space, System Calls, strace (debugging processes).
Source: Dev.to
Is Linux?
Before we debug it, we must define it.
Most people use the term “Linux” loosely.
- Strictly speaking: Linux is a kernel – a low‑level piece of software that acts as a hardware resource manager.
- Practically speaking: Linux is an operating system (OS) – the kernel plus the userland (GNU tools, shells such as
bash, libraries likeglibc, and applications that make the computer usable).
Think of the kernel as the dictator of the computer:
| Concern | Kernel’s Role |
|---|---|
| Memory Management | Who gets RAM? (If Chrome asks for 100 GB, the kernel says no.) |
| Process Scheduling | Who gets the CPU? (The kernel pauses your MP3 player 1000 times / s to let your mouse move.) |
| Hardware Abstraction | Developers write “save file”; the kernel translates that into electrical signals for a specific NVMe SSD model. |
Key concept: Modern CPUs (e.g., x86‑64) provide hardware‑level security features called Protection Rings.
Protection Rings Overview
| Ring | Who lives here? | Powers | Stakes |
|---|---|---|---|
| Ring 0 (Kernel) | Linux kernel, device drivers, kernel modules | Unlimited – can execute any CPU instruction and access any memory address | A crash causes a kernel panic (the Linux “Blue Screen of Death”) → whole machine reboots |
| Ring 3 (User space) | Web browsers, Python scripts, Docker containers, root shell | Restricted – runs in a virtual‑memory sandbox; cannot access hardware or other processes’ memory directly | If a program crashes (e.g., division by zero, illegal kernel‑memory access), the kernel sends a signal (e.g., SIGSEGV) and kills only that process; the server stays up |
System Calls – The Bridge Between Rings
Why not use Rings 1 and 2?
User space (Ring 3) cannot touch the hardware directly, so it must ask the kernel to do it. This request is a system call – the API of the Linux kernel.
The System‑Call Path
- Wrapper (glibc) – You write
printf("hello")in C orprint("hello")in Python. You are calling a library function, not the kernel yet. - Register Setup – The library places the specific syscall ID (e.g.,
1forwrite) into a CPU register (usuallyRAX). - Context Switch (Transition)
- Legacy: CPU executes interrupt
int 0x80. - Modern (fast): CPU executes the
syscallinstruction.
This forces the CPU to switch from Ring 3 to Ring 0 and jump to a predefined location in kernel code.
- Legacy: CPU executes interrupt
- Execution – The kernel checks permissions (e.g., “Does UID 1000 have permission to write to
/etc/hosts?”). If allowed, it performs the hardware task. - Return – The kernel writes the result (or error code) to a register and issues
sysret, dropping the CPU back to Ring 3.
Advanced Concept: vDSO (Virtual Dynamic Shared Object)
Problem: Switching rings is “expensive.” Calls like gettimeofday happen thousands of times per second; a full context switch each time would degrade performance.
Solution: The kernel maps a read‑only page of its own memory directly into user space. Applications can read the current time from this page without triggering a real system call or entering kernel mode. This mechanism is the vDSO.
strace – System Trace (Debugging Processes)
strace is the ultimate debugging tool for DevOps. It attaches to a process and prints every system call it makes, allowing you to debug “black‑box” binaries where you don’t have source code.
Basic Usage
# Run a command and trace it
strace ls /tmp
# Attach to a running process (e.g., a frozen web server)
strace -p 1234
Typical output snippet:
openat(AT_FDCWD, "/etc/passwd", O_RDONLY) = 3
openat– the function name."/etc/passwd"– the argument (what file?).= 3– the return value. Positive numbers are file descriptors (handles).
If you see = -1, the call failed. strace will also print the error code, e.g., -1 ENOENT (No such file or directory).
Advanced strace Techniques
| Option | Purpose | Example |
|---|---|---|
-c | Performance profiling – aggregates time spent in each syscall. | strace -c -p 1234 |
-f | Follow child processes and threads (essential for multi‑threaded apps like Nginx, Chrome, Java). | strace -f -p 1234 |
-s <size> | Increase the string size limit (default truncates long strings). | strace -s 2000 -p 1234 |
-e inject=:error= | Force a syscall to fail – useful for testing error handling. | strace -e inject=open:error=ENOSPC ./my_application |
Example: Performance Profiling Output
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
95.00 0.005000 500 10 futex
2.00 0.000100 10 10 1 open
High futex time → the app spends most of its time waiting for thread locks → a concurrency issue, not a disk issue.
Why SREs Must Know This
- Containers ≠ VMs: Docker containers share the host’s kernel. If one container triggers a kernel panic (Ring 0 crash), the host and all other containers die. Isolation is logical (namespaces), not physical.
- “Permission denied” is a kernel logic check: the kernel compares the file’s inode permissions against your UID during the
opensystem call.
TL;DR
- Linux = Kernel + Userland
- Kernel runs in Ring 0, user programs in Ring 3.
- System calls are the only way user space talks to the kernel.
- vDSO reduces the cost of frequent, read‑only kernel data.
stracelets you see every syscall, profile performance, and inject failures.
Armed with this knowledge, you can move from treating Linux as a black box to mastering its inner workings—exactly what senior SREs and kernel developers do every day.
System Calls and Privilege Levels
Latency – Every system call has a cost. High‑performance code tries to minimize system calls (e.g., by buffering data before writing).
Component Overview
| Component | Responsibility | Privilege | Crash Consequence |
|---|---|---|---|
| User Space | Applications, shells, Docker containers | Ring 3 (Restricted) | Single‑process death (SIGSEGV) |
| System Call | Interface between user & kernel | Ring 3 → Ring 0 (transition) | n/a (just a transition) |
| Kernel Space | Drivers, memory management, scheduling | Ring 0 (God mode) | Total system crash (kernel panic) |
strace
- Type: Debugging tool
- Runs in: User space
- Purpose: Reveals the truth of what an application is doing (system calls, signals, etc.).