How I Troubleshot a KVM Memory Issue That Led to Swap & High CPU (Runbook + Real Scenario)

Published: 2 months ago (February 24, 2026 at 02:36 AM EST)

5 min read

Source: Dev.to

Source: Dev.to

Recently, I noticed something strange on one of my KVM hypervisors

The server wasn’t heavily loaded, but earlier I saw:

qemu-system-x86 consuming 800 %+ CPU
kswapd running hot
Swap usage near 100 %

Later:

CPU was low
RAM had plenty free
Swap was still full

Below is the exact troubleshooting flow I followed — and how you can do the same.

🧠 Environment Context

Hypervisor: KVM + libvirt
Host RAM: 314 GB
Swap: 976 MB
Multiple VMs running
Problem VM: testnet-node3

🔍 Step 1 – Identify High‑CPU Process

ps -eo pid,comm,%cpu,%mem --sort=-%cpu | head -n 10

Output (excerpt)

qemu-system-x86  818%

Note: In Linux, 100 % = 1 CPU core, so 800 % ≈ 8 cores fully used. One VM was heavily consuming CPU.

🔎 Step 2 – Map the PID to a VM

Each VM runs as a qemu-system-x86 process.

ps -fp <PID>          # Show the command line for the PID
virsh list --all      # List all VMs
virsh dominfo <VM>    # Show details for a specific VM

Using the commands above I identified the offending VM as testnet-node3.

📊 Step 3 – Check Host Memory & Swap

free -h

Mem:   314Gi total
       217Gi used
        94Gi free
Swap:   976Mi total
        963Mi used

Swap was 98 % used while RAM still had 94 GB free – a situation that often causes panic.

🧪 Step 4 – Verify Active Memory Pressure

vmstat 1 5

Focus on the columns:

Column	Meaning
`si`	pages swapped in
`so`	pages swapped out

If both are 0, the system is not under active memory pressure.

My output

si = 0
so = 0

Interpretation:
The swap usage was historical; the kernel had swapped out pages earlier, but no current pressure exists.

🔥 Why Swap Can Stay Full Even With Free RAM

Linux does not automatically move swapped pages back into RAM unless they are needed. A previous memory‑pressure event can leave swap full long after RAM becomes available – this is normal behavior.

🧠 Step 5 – Inspect VM Memory Allocation

virsh dominfo testnet-node3

Max memory: 98304000 KiB

98304000 KiB ≈ 94 GB → the VM had ~94 GB allocated.

❓ Was the VM Actually Memory‑Starved?

Check inside the guest:

free -h
vmstat 1 5

If the guest shows:

Swap used
OOM‑killer messages
Memory > 90 % used

then increasing RAM makes sense. If not, the high CPU may be workload‑related rather than a memory shortage.

🚀 Step 6 – Increase VM RAM Safely

The VM was stopped, target RAM = 128 GB.

# 128 GB in KiB
128 * 1024 * 1024 = 134217728 KiB

virsh setmaxmem testnet-node3 128G --config
virsh setmem    testnet-node3 128G --config

Verify:

virsh dominfo testnet-node3

Start the VM:

virsh start testnet-node3

📊 Step 7 – Verify Host Stability After Resize

free -h

Mem:   314Gi total
       221Gi used
        89Gi free
Swap:   0B used

Swap cleared.

vmstat 1 5

si = 0, so = 0, CPU idle high → system healthy.

🧩 Root‑Cause Pattern

VM workload spikes → guest consumes heavy memory
Host experiences memory pressure → swap fills, kswapd CPU spikes
qemu process shows high CPU (driven by the guest)
After workload stabilises, swap remains full (no active pressure)

Without checking vmstat, many misdiagnose the issue.

🛑 Common Mistakes

❌ Increasing host RAM without checking guest usage
❌ Assuming 100 % swap = a dying system
❌ Ignoring vmstat output
❌ Allocating 100 % of host RAM to VMs

📐 Capacity‑Planning Rule for KVM Hosts

On large‑memory hosts (e.g., 314 GB), leave 16–32 GB for the host OS.
Never allocate 100 % of RAM to guests.
Monitor swap regularly; a small swap (1–4 GB) is sufficient for large‑RAM systems.

🧠 Pro Tips

# List total memory allocation for all VMs
virsh list --name | while read vm; do
  virsh dominfo "$vm" | grep -i memory
done

# See if swapping is active
vmstat 1

# Find the process that consumes most memory
ps -eo pid,comm,%mem,%cpu --sort=-%mem | head

🎯 Final Takeaway

Swap usage alone does NOT equal a memory problem.

Key indicators of a real issue are:

Active swap in/out (vmstat)
OOM events in the host or guest logs
Sustained high CPU from kswapd
Guest‑level memory pressure

In my case, the VM’s memory allocation was already sufficient; the high CPU was workload‑related, and the lingering swap was simply a remnant of a past pressure event.

Memory Upgrade Summary

Scaled from 94 GB → 128 GB

Host remained healthy
No swap pressure
System stable

If you’re running KVM in production, understanding this memory + swap + CPU interaction is critical.

Blindly adding RAM is easy.

Diagnosing correctly is what makes you a good systems engineer.

How I Troubleshot a KVM Memory Issue That Led to Swap & High CPU (Runbook + Real Scenario)

Recently, I noticed something strange on one of my KVM hypervisors

🧠 Environment Context

🔍 Step 1 – Identify High‑CPU Process

🔎 Step 2 – Map the PID to a VM

📊 Step 3 – Check Host Memory & Swap

🧪 Step 4 – Verify Active Memory Pressure

🔥 Why Swap Can Stay Full Even With Free RAM

🧠 Step 5 – Inspect VM Memory Allocation

❓ Was the VM Actually Memory‑Starved?

🚀 Step 6 – Increase VM RAM Safely

📊 Step 7 – Verify Host Stability After Resize

🧩 Root‑Cause Pattern

🛑 Common Mistakes

📐 Capacity‑Planning Rule for KVM Hosts

🧠 Pro Tips

🎯 Final Takeaway

Memory Upgrade Summary

Related posts

Stop Queuing Inference Requests

The 3-Layer Architecture That Keeps My AI Business Running

Self-Hosting Remote VSCode with Cloudflare Tunnel and Authentik SSO

The AI Infrastructure Decision Matrix: Build vs. Buy in 2026

Recently, I noticed something strange on one of my KVM hypervisors

🧠 Environment Context

🔍 Step 1 – Identify High‑CPU Process

🔎 Step 2 – Map the PID to a VM

📊 Step 3 – Check Host Memory & Swap

🧪 Step 4 – Verify Active Memory Pressure

🔥 Why Swap Can Stay Full Even With Free RAM

🧠 Step 5 – Inspect VM Memory Allocation

❓ Was the VM Actually Memory‑Starved?

🚀 Step 6 – Increase VM RAM Safely

📊 Step 7 – Verify Host Stability After Resize

🧩 Root‑Cause Pattern

🛑 Common Mistakes

📐 Capacity‑Planning Rule for KVM Hosts

🧠 Pro Tips

🎯 Final Takeaway

Memory Upgrade Summary

Related posts

Stop Queuing Inference Requests

The 3-Layer Architecture That Keeps My AI Business Running

Self-Hosting Remote VSCode with Cloudflare Tunnel and Authentik SSO

The AI Infrastructure Decision Matrix: Build vs. Buy in 2026

🔍 Step 1 – Identify High‑CPU Process

🔎 Step 2 – Map the PID to a VM

📊 Step 3 – Check Host Memory & Swap

🧪 Step 4 – Verify Active Memory Pressure

🧠 Step 5 – Inspect VM Memory Allocation

🚀 Step 6 – Increase VM RAM Safely

📊 Step 7 – Verify Host Stability After Resize