How I Troubleshot a KVM Memory Issue That Led to Swap & High CPU (Runbook + Real Scenario)

Published: (February 24, 2026 at 02:36 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

Recently, I noticed something strange on one of my KVM hypervisors

The server wasn’t heavily loaded, but earlier I saw:

  • qemu-system-x86 consuming 800 %+ CPU
  • kswapd running hot
  • Swap usage near 100 %

Later:

  • CPU was low
  • RAM had plenty free
  • Swap was still full

Below is the exact troubleshooting flow I followed — and how you can do the same.

🧠 Environment Context

  • Hypervisor: KVM + libvirt
  • Host RAM: 314 GB
  • Swap: 976 MB
  • Multiple VMs running
  • Problem VM: testnet-node3

🔍 Step 1 – Identify High‑CPU Process

ps -eo pid,comm,%cpu,%mem --sort=-%cpu | head -n 10

Output (excerpt)

qemu-system-x86  818%

Note: In Linux, 100 % = 1 CPU core, so 800 % ≈ 8 cores fully used. One VM was heavily consuming CPU.

🔎 Step 2 – Map the PID to a VM

Each VM runs as a qemu-system-x86 process.

ps -fp <PID>          # Show the command line for the PID
virsh list --all      # List all VMs
virsh dominfo <VM>    # Show details for a specific VM

Using the commands above I identified the offending VM as testnet-node3.

📊 Step 3 – Check Host Memory & Swap

free -h
Mem:   314Gi total
       217Gi used
        94Gi free
Swap:   976Mi total
        963Mi used

Swap was 98 % used while RAM still had 94 GB free – a situation that often causes panic.

🧪 Step 4 – Verify Active Memory Pressure

vmstat 1 5

Focus on the columns:

ColumnMeaning
sipages swapped in
sopages swapped out

If both are 0, the system is not under active memory pressure.

My output

si = 0
so = 0

Interpretation:
The swap usage was historical; the kernel had swapped out pages earlier, but no current pressure exists.

🔥 Why Swap Can Stay Full Even With Free RAM

Linux does not automatically move swapped pages back into RAM unless they are needed. A previous memory‑pressure event can leave swap full long after RAM becomes available – this is normal behavior.

🧠 Step 5 – Inspect VM Memory Allocation

virsh dominfo testnet-node3
Max memory: 98304000 KiB

98304000 KiB ≈ 94 GB → the VM had ~94 GB allocated.

❓ Was the VM Actually Memory‑Starved?

Check inside the guest:

free -h
vmstat 1 5

If the guest shows:

  • Swap used
  • OOM‑killer messages
  • Memory > 90 % used

then increasing RAM makes sense. If not, the high CPU may be workload‑related rather than a memory shortage.

🚀 Step 6 – Increase VM RAM Safely

The VM was stopped, target RAM = 128 GB.

# 128 GB in KiB
128 * 1024 * 1024 = 134217728 KiB
virsh setmaxmem testnet-node3 128G --config
virsh setmem    testnet-node3 128G --config

Verify:

virsh dominfo testnet-node3

Start the VM:

virsh start testnet-node3

📊 Step 7 – Verify Host Stability After Resize

free -h
Mem:   314Gi total
       221Gi used
        89Gi free
Swap:   0B used

Swap cleared.

vmstat 1 5

si = 0, so = 0, CPU idle high → system healthy.

🧩 Root‑Cause Pattern

  1. VM workload spikes → guest consumes heavy memory
  2. Host experiences memory pressure → swap fills, kswapd CPU spikes
  3. qemu process shows high CPU (driven by the guest)
  4. After workload stabilises, swap remains full (no active pressure)

Without checking vmstat, many misdiagnose the issue.

🛑 Common Mistakes

  • ❌ Increasing host RAM without checking guest usage
  • ❌ Assuming 100 % swap = a dying system
  • ❌ Ignoring vmstat output
  • ❌ Allocating 100 % of host RAM to VMs

📐 Capacity‑Planning Rule for KVM Hosts

  • On large‑memory hosts (e.g., 314 GB), leave 16–32 GB for the host OS.
  • Never allocate 100 % of RAM to guests.
  • Monitor swap regularly; a small swap (1–4 GB) is sufficient for large‑RAM systems.

🧠 Pro Tips

# List total memory allocation for all VMs
virsh list --name | while read vm; do
  virsh dominfo "$vm" | grep -i memory
done
# See if swapping is active
vmstat 1
# Find the process that consumes most memory
ps -eo pid,comm,%mem,%cpu --sort=-%mem | head

🎯 Final Takeaway

Swap usage alone does NOT equal a memory problem.

Key indicators of a real issue are:

  • Active swap in/out (vmstat)
  • OOM events in the host or guest logs
  • Sustained high CPU from kswapd
  • Guest‑level memory pressure

In my case, the VM’s memory allocation was already sufficient; the high CPU was workload‑related, and the lingering swap was simply a remnant of a past pressure event.

Memory Upgrade Summary

Scaled from 94 GB → 128 GB

  • Host remained healthy
  • No swap pressure
  • System stable

If you’re running KVM in production, understanding this memory + swap + CPU interaction is critical.

Blindly adding RAM is easy.

Diagnosing correctly is what makes you a good systems engineer.

0 views
Back to Blog

Related posts

Read more »

DevOps and Vibe Coding: A Journey

Things to Do Map Your Application - Map your application on paper, in a spreadsheet, or using graphics/flowcharts. This is the first step. - Understanding the...

OpenAI just raised $110 billion. Wow

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as we...