Chapter 2: Linux System Calls
Source: Dev.to
Linux System Calls – The “Front Door” to the Kernel
This post is part of the Ultimate Container Security Series, a structured, multi‑part guide covering container security from foundational concepts to runtime protection. For an overview of the series structure, scope, and update schedule, see the series introduction post here.
1. Linux execution “worlds”
| World | Description |
|---|---|
| Userspace | Where user‑facing applications run (web servers, browsers, editors, CLI tools, background services, etc.). It is a restricted zone – applications cannot directly access hardware or manage critical system resources. This restriction improves stability: if an app crashes, the whole OS usually stays up. |
| Kernel space | Where the Linux kernel lives. It controls memory, processes, scheduling, hardware, drivers, filesystems, networking, security, and more. It interacts directly with the CPU, RAM, disk, and other hardware with full privileges. |
2. Where do system calls fit in?
Applications run in userspace with lower privileges.
If an application needs something that requires kernel privileges—e.g.:
- opening a file
- reading/writing data
- creating a process
- allocating memory
- sending network traffic
- getting the current time
—it must ask the kernel to do it.
That request is made through the system‑call interface (also called the syscall interface).
Definition (plain terms) – A system call is a programmatic way for a user‑space application to request a service from the Linux kernel, safely and in a controlled way.
Why the distinction?
- Security & stability – User programs can’t touch hardware or kernel memory directly; that would be dangerous.
- Controlled entry points – System calls provide a limited, vetted set of entry points into the kernel.
Not everything needs the kernel. For example, tokenising a string happens entirely in userspace. Anything involving files, devices, networking, or process management, however, requires syscalls.
Linux ships with 300+ system calls (the exact number varies by kernel version and CPU architecture).
3. Common system calls (examples)
| What the program wants | System call |
|---|---|
| Read a file | read() |
| Write a file | write() |
| Open a file | open() |
| Start a new program | execve() |
| Create a process | fork() |
| Allocate memory | mmap() |
| Send network data | send() |
| Get current time | clock_gettime() |
You can browse the full list via the man page: man 2 syscalls.
4. High‑level view of a syscall
From the programmer’s perspective a syscall looks like a normal function call, but under the hood it performs a controlled transition into kernel mode.
Typical flow
- The user application calls a standard library function (e.g.,
read()). - That function triggers a system call using a system‑call number.
- The CPU switches from user mode to kernel mode.
- The Linux kernel executes the requested operation.
- Control returns to the application with a result (or an error).
Example: read(fd, buffer, size) triggers the kernel’s read implementation for that file descriptor and returns the number of bytes read (or ‑1 on error, with details stored in errno).
5. Using syscalls from a higher‑level language
As an application developer you rarely invoke syscalls “raw”. You usually use wrapper functions:
| Language | Wrapper source |
|---|---|
| C / C++ | glibc (e.g., read(), write(), open()) |
| Go | syscall package (or higher‑level os package) |
These wrappers:
- Validate and arrange arguments
- Perform the transition to kernel mode
- Return the result in a familiar way
6. Minimal C example – printing to stdout
#include
int main(void) {
const char msg[] = "Hello, World!\n";
/* write() is a glibc wrapper around the write syscall */
write(1, msg, sizeof(msg) - 1); /* fd 1 = stdout */
return 0;
}
What happens step‑by‑step?
write(1, msg, sizeof(msg) - 1)is called from userspace.write()(from glibc) prepares the syscall (places the syscall number and arguments in the appropriate registers).- The CPU switches to kernel mode via the syscall interface.
- The kernel validates:
- that file descriptor 1 is valid,
- that the process is allowed to write to it,
- that
msgpoints to accessible memory.
- The kernel writes the bytes to stdout (usually your terminal).
- The kernel returns the number of bytes written; execution resumes in userspace.
Even though the code looks trivial, the important takeaway is that any interaction with files, processes, networking, memory mapping, etc., goes through system calls.
7. Containers and system calls
Containers are just processes running on the host Linux kernel.
- Containers do not have a separate kernel; they share the host kernel.
- System calls are the only way container processes interact with that kernel.
Therefore, everything a container does—reading files, opening sockets, creating processes—flows through syscalls. The application code uses syscalls the same way whether it runs on the host or inside a container.
8. Security implications
Because containers depend on the host kernel, syscalls become a powerful security control point:
- If a process can invoke powerful syscalls, it may be able to do powerful (and potentially dangerous) things.
- Least‑privilege matters: not all applications need all syscalls.
- By restricting which syscalls a containerized application can use, you reduce the attack surface.
Bottom line: If an attacker compromises a containerized app, the damage they can do depends heavily on which syscalls (and privileges) that process is allowed to use.
Container hardening therefore often focuses on reducing kernel exposure—e.g., using seccomp profiles, AppArmor, SELinux, or other mechanisms to limit the syscalls a container may invoke.
9. What’s next?
The next chapters will build on this foundation to explain:
- How containers provide isolation, resource management, and security boundaries
- Runtime protection techniques (seccomp, capabilities, namespaces, etc.)
- Practical hardening steps for real‑world workloads
Stay tuned!
Container Security Controls
- seccomp – restricts system calls.
- Capabilities – drop unnecessary privileges.
- Namespaces & cgroups – provide isolation and resource limits.
In later chapters, we’ll build directly on this idea to show how containers create boundaries, and how to tighten them.
Further Resources
This article is one piece of the Ultimate Container Security Series, an ongoing effort to organize and explain container security concepts in a practical way. If you want to explore related topics or see what’s coming next, the Series Introduction provides the complete roadmap.