Kubernetes HPA Not Scaling: Debugging Guide
Source: Dev.to
Photo by Ibrahim Yusuf on Unsplash
Introduction
In production, the ability to scale on demand is essential for performance and reliability. HPA is a core component for achieving that, but when it doesn’t work, diagnosing the problem can be challenging. By the end of this article you will:
- Understand why HPA may not scale.
- Follow a clear, step‑by‑step troubleshooting workflow.
- Apply best practices to prevent future scaling issues.
Understanding the Problem
Typical symptoms of a non‑scaling HPA include:
- Pods not scaling up or down as expected.
- HPA not reacting to changes in CPU, memory, or custom metrics.
- Errors in the HPA controller logs.
Real‑world example: A marketing campaign drives a traffic spike, but the pods stay at the original replica count, causing latency spikes and possible downtime. To resolve this, you need to understand the HPA internals and the components that influence its decisions.
Prerequisites
To follow this guide you need:
- Basic knowledge of Kubernetes and HPA.
- A Kubernetes cluster with the HPA feature gate enabled.
kubectlinstalled and configured to talk to the cluster.- A text editor or IDE for editing YAML manifests.
- Access to a terminal/command prompt.
Step‑by‑Step Solution
Step 1: Diagnosis
-
Check HPA objects
kubectl get hpa -A -
Inspect pod health
kubectl get pods -A -
Find non‑Running pods
kubectl get pods -A | grep -v Running -
Examine HPA events and controller logs
kubectl describe hpa -n <namespace> <hpa-name># If you have access to the controller manager logs kubectl logs -n kube-system -l component=horizontal-pod-autoscalerLook for warnings such as “failed to get metrics” or “unable to scale”.
Step 2: Implementation (Creating a Correct HPA)
Below is a minimal example of a Deployment together with a matching HPA. Adjust the resource requests/limits and metric targets to suit your workload.
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
labels:
app: example
spec:
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: example
image: example/image:latest
resources:
requests:
cpu: 100m
limits:
cpu: 200m
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: example-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: example-deployment
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Adjust as needed
Key points
- Use
scaleTargetRef(notselector) to point the HPA at the Deployment. - Ensure the Deployment’s pod template contains the same labels (
app: example). - Set realistic
requests/limitsso the HPA can calculate utilization correctly. - Choose an appropriate
averageUtilization(or use custom metrics if needed).
Step 3: Verify the HPA Works
-
Apply the manifests
kubectl apply -f deployment-and-hpa.yaml -
Generate load (e.g., with
heyorwrk) to push CPU usage above the target. -
Watch the HPA status
kubectl get hpa example-hpa -wYou should see the
CURRENTreplica count increase as the metric crosses the threshold.
Best Practices & Common Pitfalls
| Pitfall | Why it Happens | Fix |
|---|---|---|
| Missing resource requests | HPA can’t compute utilization without a request value. | Define resources.requests.cpu (and memory if needed). |
Incorrect scaleTargetRef | HPA points to the wrong object, so no scaling occurs. | Verify apiVersion, kind, and name match the target workload. |
| Metrics Server not installed | HPA can’t fetch CPU/memory metrics. | Deploy the Metrics Server (kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml). |
Too low maxReplicas | HPA hits the ceiling before meeting demand. | Set maxReplicas high enough for expected spikes. |
| Pod Disruption Budgets blocking scale‑down | PDB prevents pods from terminating, causing HPA to think it can’t scale down. | Adjust PDB minAvailable or maxUnavailable as appropriate. |
| Custom metrics not exposed | HPA using custom metrics fails silently. | Ensure the custom metrics API (e.g., Prometheus Adapter) is correctly configured and metrics are exposed. |
Utilization: 50
kubectl apply -f example.yaml
Verification
To verify that the HPA setup is working correctly, run:
kubectl get hpa example-hpa -o yaml
This displays the current HPA configuration, including the number of replicas.
You can also check the pod status:
kubectl get pods -A
Complete Manifest Example
A full Kubernetes manifest with HPA enabled:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
spec:
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: example
image: example/image
resources:
requests:
cpu: 100m
limits:
cpu: 200m
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: example-hpa
spec:
selector:
matchLabels:
app: example
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
---
apiVersion: v1
kind: Service
metadata:
name: example-service
spec:
selector:
app: example
ports:
- name: http
port: 80
targetPort: 8080
type: LoadBalancer
This manifest creates a deployment with three replicas, an HPA that targets 50 % CPU utilization, and a LoadBalancer service exposing the pods.
Common Pitfalls and How to Avoid Them
- Insufficient resources – Ensure the cluster has enough capacity to scale.
- Incorrect metrics – Verify the HPA uses the correct metric (CPU, memory, or custom).
- Inadequate monitoring – Set up alerts to detect HPA issues early.
- Inconsistent labels – Keep deployment and HPA labels in sync so the controller can match them.
- Inadequate testing – Simulate load to confirm the HPA behaves as expected.
Best Practices Summary
- Combine resource‑based and custom metrics for responsive scaling.
- Monitor HPA status and pod performance continuously.
- Use consistent labels and annotations across resources.
- Test scaling behavior under realistic workloads.
- Deploy a load balancer or ingress controller to distribute traffic evenly.
Conclusion
We examined common reasons why an HPA might not scale and provided a step‑by‑step troubleshooting guide. By following the best‑practice recommendations, you can ensure your Kubernetes cluster scales efficiently, delivering a reliable experience for users.
Further Reading
- Kubernetes Deployment Strategies – Rolling updates, blue‑green, canary, etc.
- Kubernetes Networking – Pods, Services, Ingress controllers.
- Kubernetes Security – Network policies, secrets, RBAC.
🚀 Level Up Your DevOps Skills
Want to master Kubernetes troubleshooting? Check out these resources:
📚 Recommended Tools
- Lens – The Kubernetes IDE that makes debugging 10× faster.
- k9s – Terminal‑based Kubernetes dashboard.
- Stern – Multi‑pod log tailing.
📖 Courses & Books
- Kubernetes Troubleshooting in 7 Days – Step‑by‑step email course ($7).
- Kubernetes in Action – Definitive guide (Amazon).
- Cloud Native DevOps with Kubernetes – Production best practices.
📬 Stay Updated
Subscribe to DevOps Daily Newsletter for:
- 3 curated articles per week
- Production incident case studies
- Exclusive troubleshooting tips
Found this helpful? Share it with your team!
Originally published at aicontentlab.xyz.
