Kubernetes HPA Not Scaling: Debugging Guide

Published: (February 16, 2026 at 03:01 AM EST)
6 min read
Source: Dev.to

Source: Dev.to

Cover image for Kubernetes HPA Not Scaling: Debugging Guide

Sergei

Photo by Ibrahim Yusuf on Unsplash

Introduction

In production, the ability to scale on demand is essential for performance and reliability. HPA is a core component for achieving that, but when it doesn’t work, diagnosing the problem can be challenging. By the end of this article you will:

  • Understand why HPA may not scale.
  • Follow a clear, step‑by‑step troubleshooting workflow.
  • Apply best practices to prevent future scaling issues.

Understanding the Problem

Typical symptoms of a non‑scaling HPA include:

  • Pods not scaling up or down as expected.
  • HPA not reacting to changes in CPU, memory, or custom metrics.
  • Errors in the HPA controller logs.

Real‑world example: A marketing campaign drives a traffic spike, but the pods stay at the original replica count, causing latency spikes and possible downtime. To resolve this, you need to understand the HPA internals and the components that influence its decisions.

Prerequisites

To follow this guide you need:

  • Basic knowledge of Kubernetes and HPA.
  • A Kubernetes cluster with the HPA feature gate enabled.
  • kubectl installed and configured to talk to the cluster.
  • A text editor or IDE for editing YAML manifests.
  • Access to a terminal/command prompt.

Step‑by‑Step Solution

Step 1: Diagnosis

  1. Check HPA objects

    kubectl get hpa -A
  2. Inspect pod health

    kubectl get pods -A
  3. Find non‑Running pods

    kubectl get pods -A | grep -v Running
  4. Examine HPA events and controller logs

    kubectl describe hpa -n <namespace> <hpa-name>
    # If you have access to the controller manager logs
    kubectl logs -n kube-system -l component=horizontal-pod-autoscaler

    Look for warnings such as “failed to get metrics” or “unable to scale”.

Step 2: Implementation (Creating a Correct HPA)

Below is a minimal example of a Deployment together with a matching HPA. Adjust the resource requests/limits and metric targets to suit your workload.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
  labels:
    app: example
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
        - name: example
          image: example/image:latest
          resources:
            requests:
              cpu: 100m
            limits:
              cpu: 200m
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50   # Adjust as needed

Key points

  • Use scaleTargetRef (not selector) to point the HPA at the Deployment.
  • Ensure the Deployment’s pod template contains the same labels (app: example).
  • Set realistic requests/limits so the HPA can calculate utilization correctly.
  • Choose an appropriate averageUtilization (or use custom metrics if needed).

Step 3: Verify the HPA Works

  1. Apply the manifests

    kubectl apply -f deployment-and-hpa.yaml
  2. Generate load (e.g., with hey or wrk) to push CPU usage above the target.

  3. Watch the HPA status

    kubectl get hpa example-hpa -w

    You should see the CURRENT replica count increase as the metric crosses the threshold.

Best Practices & Common Pitfalls

PitfallWhy it HappensFix
Missing resource requestsHPA can’t compute utilization without a request value.Define resources.requests.cpu (and memory if needed).
Incorrect scaleTargetRefHPA points to the wrong object, so no scaling occurs.Verify apiVersion, kind, and name match the target workload.
Metrics Server not installedHPA can’t fetch CPU/memory metrics.Deploy the Metrics Server (kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml).
Too low maxReplicasHPA hits the ceiling before meeting demand.Set maxReplicas high enough for expected spikes.
Pod Disruption Budgets blocking scale‑downPDB prevents pods from terminating, causing HPA to think it can’t scale down.Adjust PDB minAvailable or maxUnavailable as appropriate.
Custom metrics not exposedHPA using custom metrics fails silently.Ensure the custom metrics API (e.g., Prometheus Adapter) is correctly configured and metrics are exposed.

Utilization: 50

kubectl apply -f example.yaml

Verification

To verify that the HPA setup is working correctly, run:

kubectl get hpa example-hpa -o yaml

This displays the current HPA configuration, including the number of replicas.

You can also check the pod status:

kubectl get pods -A

Complete Manifest Example

A full Kubernetes manifest with HPA enabled:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
      - name: example
        image: example/image
        resources:
          requests:
            cpu: 100m
          limits:
            cpu: 200m
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
spec:
  selector:
    matchLabels:
      app: example
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
---
apiVersion: v1
kind: Service
metadata:
  name: example-service
spec:
  selector:
    app: example
  ports:
  - name: http
    port: 80
    targetPort: 8080
  type: LoadBalancer

This manifest creates a deployment with three replicas, an HPA that targets 50 % CPU utilization, and a LoadBalancer service exposing the pods.

Common Pitfalls and How to Avoid Them

  • Insufficient resources – Ensure the cluster has enough capacity to scale.
  • Incorrect metrics – Verify the HPA uses the correct metric (CPU, memory, or custom).
  • Inadequate monitoring – Set up alerts to detect HPA issues early.
  • Inconsistent labels – Keep deployment and HPA labels in sync so the controller can match them.
  • Inadequate testing – Simulate load to confirm the HPA behaves as expected.

Best Practices Summary

  • Combine resource‑based and custom metrics for responsive scaling.
  • Monitor HPA status and pod performance continuously.
  • Use consistent labels and annotations across resources.
  • Test scaling behavior under realistic workloads.
  • Deploy a load balancer or ingress controller to distribute traffic evenly.

Conclusion

We examined common reasons why an HPA might not scale and provided a step‑by‑step troubleshooting guide. By following the best‑practice recommendations, you can ensure your Kubernetes cluster scales efficiently, delivering a reliable experience for users.

Further Reading

  • Kubernetes Deployment Strategies – Rolling updates, blue‑green, canary, etc.
  • Kubernetes Networking – Pods, Services, Ingress controllers.
  • Kubernetes Security – Network policies, secrets, RBAC.

🚀 Level Up Your DevOps Skills

Want to master Kubernetes troubleshooting? Check out these resources:

  • Lens – The Kubernetes IDE that makes debugging 10× faster.
  • k9s – Terminal‑based Kubernetes dashboard.
  • Stern – Multi‑pod log tailing.

📖 Courses & Books

  • Kubernetes Troubleshooting in 7 Days – Step‑by‑step email course ($7).
  • Kubernetes in Action – Definitive guide (Amazon).
  • Cloud Native DevOps with Kubernetes – Production best practices.

📬 Stay Updated

Subscribe to DevOps Daily Newsletter for:

  • 3 curated articles per week
  • Production incident case studies
  • Exclusive troubleshooting tips

Found this helpful? Share it with your team!

Originally published at aicontentlab.xyz.

0 views
Back to Blog

Related posts

Read more »