Kubernetes Autoscaling Showdown: HPA vs. VPA vs. Karpenter vs. KEDA

Published: 3 days ago (December 3, 2025 at 02:18 AM EST)

4 min read

Source: Dev.to

The Two Layers of Scaling

Before analyzing the specific tools, it is crucial to understand that Kubernetes scaling happens on two distinct layers:

Pod Scaling (Application Layer): Adjusting the number of pod replicas or the size of individual pods. This is about application capacity.
Node Scaling (Infrastructure Layer): Adjusting the number of virtual machines (nodes) in the cluster to support the pods. This is about compute capacity.

If you scale your pods but have no nodes to place them on, your scaling fails (Pending Pods). If you scale your nodes but your pods don’t utilize them, you are burning money. The art of scaling lies in synchronizing these two layers.

Layer 1: Pod‑Level Scaling

1. Horizontal Pod Autoscaler (HPA)

How It Works
HPA queries the metrics server (or custom metrics APIs) at regular intervals (default is 15 seconds). It compares the current metric value (e.g., CPU utilization) against a target value defined in the HorizontalPodAutoscaler resource. If the current usage exceeds the target, HPA calculates the required number of replicas and updates the scale subresource of the deployment or StatefulSet.

The Math

Strengths

Native & Simple: No external CRDs required for basic CPU/Memory scaling.
Resiliency: Perfect for handling traffic spikes by distributing load across more instances.
Zero Downtime: Scaling out does not require restarting existing pods.

Weaknesses

Cold Starts: HPA is reactive. If your application takes ~60 seconds to boot (e.g., JVM apps), HPA might scale out too late during a sudden spike.
Thrashing: Without proper stabilizationWindowSeconds configuration, HPA can scale up and down rapidly, causing instability.
Limited by Node Capacity: HPA scales pods, not nodes. If your cluster is full, HPA creates Pending pods and stops there.

Best For
Stateless microservices, web servers, and applications where load is distributed.

2. Vertical Pod Autoscaler (VPA)

How It Works
VPA automatically adjusts the CPU and memory requests/limits of your pods to match their actual usage. It consists of three components:

Recommender: Monitors historical resource usage.
Updater: Evicts pods that need new resource limits.
Admission Controller: Intercepts pod creation to inject the correct resource requests.

VPA Modes

Mode	Description
Off	Calculates recommendations but does not apply them (useful for “dry‑run”).
Initial	Applies resources only when a pod is first created.
Recreate	Evicts pods immediately if their requests differ significantly from the recommendation.
Auto	Currently functions similarly to Recreate.

Strengths

Right‑Sizing: Ideal for correcting human error. Developers often guess resource requests (e.g., “Give it 2 GB RAM”). VPA fixes this based on reality.
Legacy Apps: Perfect for monolithic applications that cannot be easily replicated (cannot scale horizontally).

Weaknesses

Disruption: Changing a pod’s resources requires a restart, causing downtime unless you have strict Pod Disruption Budgets (PDB) and high availability.
HPA Conflict: You generally cannot use HPA and VPA on the same metric (CPU/Memory) simultaneously. They will fight each other—HPA adds pods while VPA tries to increase pod limits.

Best For
Stateful workloads, monoliths, and “Goldilocks” analysis (using VPA in Off mode to generate reports on ideal resource sizing).

3. KEDA (Kubernetes Event‑Driven Autoscaling)

How It Works
KEDA installs a controller and an operator that act as a “Metrics Server” for HPA. You define a ScaledObject that references a trigger (e.g., Kafka topic lag, SQS queue depth, Prometheus query). KEDA monitors the event source:

0 → 1 Scaling: If there are no events, KEDA scales the deployment to 0 (saving money). When an event arrives, it scales to 1.
1 → N Scaling: Once the pod is running, KEDA feeds the event metrics to the native HPA to scale from 1 to N.

Strengths

Scale‑to‑Zero: Massive cost saver for dev environments or sporadic batch processing.
Proactive Scaling: Scales based on queue length before CPU spikes.
Rich Ecosystem: Supports 50+ scalers (Azure Service Bus, Redis, Postgres, AWS SQS, etc.).

Weaknesses

Complexity: Adds another CRD and controller to manage.
Latency: Scaling from 0 to 1 incurs a cold‑start penalty while the pod is scheduled and booted.

Best For
Event‑driven architectures, queue‑based workers, and serverless‑style workloads on Kubernetes.

Layer 2: Node‑Level Scaling

4. Cluster Autoscaler (CA)

How It Works
Cluster Autoscaler is a control loop that interfaces with your cloud provider’s auto‑scaling groups (ASG in AWS, VMSS in Azure, etc.). It checks two conditions:

Scale Up: Are there pods in a Pending state because of insufficient resources? If yes, request the cloud provider to add a node.
Scale Down: Are there nodes with low utilization that can be consolidated? If yes, evict pods to other nodes and terminate the empty node.

Strengths

Mature & Stable: Battle‑tested in production for years.
Cloud‑Agnostic: Works on AWS, GCP, Azure, and others with minimal changes.

Weaknesses

Slow: Tied to the cloud provider’s node groups. Booting a node often involves the cloud API, spinning up an EC2 instance, registering it to the cluster, and pulling images—a process that can take 2–5 minutes.
Rigid: Scales based on predefined “Node Groups.” If you need a GPU node but only have a general‑purpose group, CA cannot provision it unless you pre‑configure a GPU node group.
Cost Inefficiency: It doesn’t inherently hunt for the most cost‑effective instance types; it only adds/removes nodes within the existing groups.

(The article’s discussion of Karpenter was truncated, so it is omitted here.)

Kubernetes Autoscaling Showdown: HPA vs. VPA vs. Karpenter vs. KEDA

The Two Layers of Scaling

Layer 1: Pod‑Level Scaling

1. Horizontal Pod Autoscaler (HPA)

2. Vertical Pod Autoscaler (VPA)

3. KEDA (Kubernetes Event‑Driven Autoscaling)

Layer 2: Node‑Level Scaling

4. Cluster Autoscaler (CA)

Related posts

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

AWS re:Invent 2025 - Zoox: Building Machine Learning Infrastructure for Autonomous Vehicles (AMZ304)

arreglar pinchazos cerca de mi en Alpedrete

AWS re:Invent 2025 - Intelligent security: Protection at scale from development to production-INV214