Scaling Kubernetes Load with KEDA

Published: 2 days ago (December 14, 2025 at 01:12 AM EST)

2 min read

Source: Dev.to

Why the HPA May Not Scale Your Pods

The Horizontal Pod Autoscaler (HPA) scales only based on CPU or memory utilization, which does not always reflect real‑world workloads. When you need to scale based on metrics such as requests per second, queue depth, or the result of a database query, the HPA alone is insufficient.

Introducing KEDA

KEDA (Kubernetes Event‑Driven Autoscaling) complements the Kubernetes ecosystem by monitoring external events—such as message queues, Prometheus/Grafana metrics, SQL queries, and HTTP traffic—and exposing them as metrics that the HPA can consume. This allows the HPA to make scaling decisions based on signals that more accurately represent the actual application load.

Key capabilities

Event‑driven scaling: Scale workloads down to zero when there is no traffic and automatically scale back up as events arrive.
No changes to the application: The workload remains unchanged; KEDA acts as a clean extension to Kubernetes autoscaling.
Native integration: KEDA uses a ScaledObject custom resource, registered through the Kubernetes API server, to define which workload should scale and which external trigger should drive that scaling.

How KEDA Works

ScaledObject definition – A custom resource that specifies the target workload and the external event source.
Scaler – KEDA continuously observes the external trigger (e.g., a message queue, HTTP traffic, or a monitoring system) using specialized scalers.
Metrics Adapter – When events are detected, KEDA’s controller evaluates the demand and exposes the corresponding metric through its metrics adapter.
HPA consumption – The HPA consumes these metrics and remains the sole component responsible for scaling pods.
Scale‑to‑zero – If no events are present, KEDA allows the workload to scale down to zero; when events reappear, pods are scaled back up.

Proof of Concept at KCD Guatemala 2025

During KCD Guatemala 2025, a PoC demonstrated automatic scaling of a Kubernetes workload based on real HTTP traffic using KEDA. The demo included:

A lightweight sample application.
KEDA HTTP add‑on configuration.
ScaledObject definitions.
Kubernetes manifests showing event‑driven autoscaling from zero to multiple replicas and back down when traffic stops.
Scripts and instructions to generate load and observe scaling behavior in real time.

The goal was to provide a clear, hands‑on example of event‑based autoscaling rather than a production‑ready system.

Resources

Complete setup, source code, and documentation: GitHub – keda-demo-kcd

Scaling Kubernetes Load with KEDA

Why the HPA May Not Scale Your Pods

Introducing KEDA

Key capabilities

How KEDA Works

Proof of Concept at KCD Guatemala 2025

Resources

Related posts

Live from re:Invent…it’s Stack Overflow!

Oasis launches a strategic investment arm and backs SemiLiquid to build confidential RWA credit infrastructure

What Is Healthcare Analytics and Why It Matters in Modern Healthcare

16 Performance Boost and 98% Cost Reduction: A Dive into the Upgraded SLS Vector Indexing Architecture