Debugging Kubernetes Nodes in NotReady State

Published: 3 days ago (February 21, 2026 at 11:20 AM EST)

6 min read

Source: Dev.to

A node stuck in NotReady is one of the most common, and most disruptive, Kubernetes issues. When a node goes NotReady, the control plane stops scheduling new pods to it and begins evicting existing workloads after a timeout.

Here’s how to diagnose the root cause and fix it.

What Does `NotReady` Mean?

Every Kubernetes node runs a kubelet process that periodically reports its status to the API server. When the API server stops receiving these heartbeats (default: every 10 seconds, timeout after 40 seconds), it marks the node as NotReady.

The NotReady status means: the control plane cannot confirm this node is healthy and available for work.

Check node status with:

kubectl get nodes

Output showing a problem:

NAME          STATUS     ROLES    AGE   VERSION
worker-01     Ready         45d   v1.34.2
worker-02     NotReady      45d   v1.34.2
worker-03     Ready         45d   v1.34.2

Step 1: Check Node Conditions

Start with kubectl describe node to see what conditions are reported:

kubectl describe node worker-02

Look at the Conditions section:

Conditions:
Type                 Status  Reason
----                 ------  ------
MemoryPressure       False   KubeletHasSufficientMemory
DiskPressure         True    KubeletHasDiskPressure
PIDPressure          False   KubeletHasSufficientPID
Ready                False   KubeletNotReady

Common condition flags

Condition	Meaning
DiskPressure: True	Node filesystem is running out of space. Kubernetes begins evicting pods when disk usage exceeds the eviction threshold (default: 85%).
MemoryPressure: True	RAM is exhausted. The kubelet starts killing pods based on their QoS class.
PIDPressure: True	The node is running out of process IDs, usually caused by a pod fork‑bomb or a leak in container processes.
Ready: False	Generic “kubelet is unhealthy”; dig deeper into kubelet logs.

Step 2: Check Kubelet Logs

The kubelet is the agent that maintains node health. If it’s crashing or misconfigured, the node goes NotReady.

# SSH into the node
ssh worker-02

# Check kubelet status
systemctl status kubelet

# View recent logs
journalctl -u kubelet --since "10 minutes ago" --no-pager

Common kubelet issues

Symptom	Likely Cause	Fix
`kubelet` service stopped	Process crash or OOM kill	`systemctl restart kubelet`
Certificate expired	TLS cert rotation failed	`kubeadm certs renew all`
“Failed to connect to apiserver”	Network issue or API server down	Check network, firewall rules, API server health
“PLEG is not healthy”	Container runtime issue	`systemctl restart containerd`
“node not found”	Node was deleted from cluster	Re‑join: `kubeadm join …`

Step 3: Check Container Runtime

Kubernetes relies on a container runtime (containerd or CRI‑O). If the runtime is unhealthy, the kubelet can’t manage pods.

# Check containerd status
systemctl status containerd

# Check runtime endpoint
crictl info

# List containers (should show running system containers)
crictl ps

If containerd is unresponsive:

systemctl restart containerd
systemctl restart kubelet

Step 4: Check Resource Exhaustion

The most common cause of NotReady nodes is resource exhaustion.

Disk Space

df -h /
df -h /var/lib/kubelet
df -h /var/lib/containerd

Fix: Clean up unused container images and stopped containers.

# containerd
crictl rmi --prune

# Docker (if used)
docker system prune -af

Memory

free -h
# Check top memory consumers
ps aux --sort=-%mem | head -20

Fix: Identify pods consuming excessive memory and ensure resource limits are set.

kubectl top pods --all-namespaces --sort-by=memory | head -20

Process IDs

# Check current PID count vs limit
cat /proc/sys/kernel/pid_max
ls /proc | grep -c '^[0-9]'

Step 5: Check Networking

If the node can’t reach the API server, it goes NotReady even if it’s otherwise healthy.

# Test API server connectivity
curl -k https://:6443/healthz

# Check DNS resolution
nslookup kubernetes.default.svc.cluster.local

# Verify network plugin (CNI) is running
crictl ps | grep -E "calico|flannel|cilium|weave"

CNI plugin crashed? This is surprisingly common. If your network‑plugin pod (Calico, Flannel, Cilium, etc.) isn’t running, all networking fails:

kubectl get pods -n kube-system | grep -E "calico|flannel|cilium"

Step 6: Check Cloud Provider Issues

On managed Kubernetes (EKS, GKE, AKS), NotReady nodes can also mean:

Instance was terminated by the autoscaler or spot‑instance reclamation.
Instance health check failed at the cloud‑provider level.

(Continue troubleshooting based on your specific cloud provider’s diagnostics.)

Network ACL or Security Group Blocking kubelet‑to‑API‑server Traffic

Check your cloud provider’s instance status:

# AWS EKS
aws ec2 describe-instance-status --instance-ids <instance-id>

# GKE – check node pool status
gcloud container node-pools describe <node-pool> --cluster <cluster-name>

# AKS
az aks nodepool show --name <nodepool-name> --cluster-name <cluster-name> -g <resource-group>

Prevention: Stop NotReady Before It Happens

Set resource requests and limits on all pods. Without limits, a single pod can consume all memory and crash the node.
Enable node auto‑repair on managed services (GKE and AKS support this natively; EKS via node health checks).
Monitor disk usage and set up alerts at 70 % capacity, well before the 85 % eviction threshold.
Use Pod Disruption Budgets (PDBs) to control how many pods can be evicted simultaneously.
Keep Kubernetes versions current. Older versions have known kubelet bugs that can cause spurious NotReady events. Check your version’s health status on ReleaseRun.

Quick Troubleshooting Flowchart

kubectl describe node    # Check conditions

Is DiskPressure/MemoryPressure true? → Clean up resources.
Is kubelet running? → systemctl status kubelet → restart if needed.
Is containerd running? → systemctl status containerd → restart if needed.
Can the node reach the API server? → Check network/firewall rules.
Is the CNI plugin running? → Verify kube‑system pods.
Still stuck? → Inspect journalctl -u kubelet for specific errors.

Track Your Kubernetes Version Health

Running an older Kubernetes version increases the risk of kubelet bugs that cause NotReady events. Kubernetes 1.32 reaches end‑of‑life on February 28 2026. If you’re still on it, consult our migration playbook.

Monitor every version’s health, CVE status, and EOL dates at ReleaseRun’s Kubernetes hub.

Kubernetes Upgrade Checklist – The runbook that prevents NotReady states during upgrades.
Kubernetes 1.32 End of Life: Migration Playbook – Deadline: February 28 2026.
Kubernetes EOL Policy Explained – Know when your version loses support.
Kubelet Restarts in K8s 1.35.1 – Related node‑stability issue to test.
Popular Kubernetes Distributions Compared – How different distributions handle node issues.

Debugging Kubernetes Nodes in NotReady State

What Does `NotReady` Mean?

Step 1: Check Node Conditions

Common condition flags

Step 2: Check Kubelet Logs

Common kubelet issues

Step 3: Check Container Runtime

Step 4: Check Resource Exhaustion

Disk Space

Memory

Process IDs

Step 5: Check Networking

Step 6: Check Cloud Provider Issues

Network ACL or Security Group Blocking kubelet‑to‑API‑server Traffic

Prevention: Stop NotReady Before It Happens

Quick Troubleshooting Flowchart

Track Your Kubernetes Version Health

Related posts

How to Integrate an AI Chatbot Into Your Application: A Practical Engineering Guide

Integration Reliability for AI Systems: A Framework for Detecting and Preventing Interface Mismatch at Scale

AWS Extends Agentic AI Capabilities of Kiro Developer Tool to Improve Code Quality

The DevSecOps Paradox: Why Security Automation Is Both Solving and Creating Pipeline Vulnerabilities

What Does NotReady Mean?

Step 1: Check Node Conditions

Common condition flags

Step 2: Check Kubelet Logs

Common kubelet issues

Step 3: Check Container Runtime

Step 4: Check Resource Exhaustion

Disk Space

Memory

Process IDs

Step 5: Check Networking

Step 6: Check Cloud Provider Issues

Network ACL or Security Group Blocking kubelet‑to‑API‑server Traffic

Prevention: Stop NotReady Before It Happens

Quick Troubleshooting Flowchart

Track Your Kubernetes Version Health

Related Reading

Related posts

How to Integrate an AI Chatbot Into Your Application: A Practical Engineering Guide

Integration Reliability for AI Systems: A Framework for Detecting and Preventing Interface Mismatch at Scale

AWS Extends Agentic AI Capabilities of Kiro Developer Tool to Improve Code Quality

The DevSecOps Paradox: Why Security Automation Is Both Solving and Creating Pipeline Vulnerabilities

What Does `NotReady` Mean?

Step 1: Check Node Conditions

Step 2: Check Kubelet Logs

Step 3: Check Container Runtime

Step 4: Check Resource Exhaustion

Step 5: Check Networking

Step 6: Check Cloud Provider Issues