Scaling Kubernetes Load with KEDA
Source: Dev.to
Why the HPA May Not Scale Your Pods
The Horizontal Pod Autoscaler (HPA) scales only based on CPU or memory utilization, which does not always reflect real‑world workloads. When you need to scale based on metrics such as requests per second, queue depth, or the result of a database query, the HPA alone is insufficient.
Introducing KEDA
KEDA (Kubernetes Event‑Driven Autoscaling) complements the Kubernetes ecosystem by monitoring external events—such as message queues, Prometheus/Grafana metrics, SQL queries, and HTTP traffic—and exposing them as metrics that the HPA can consume. This allows the HPA to make scaling decisions based on signals that more accurately represent the actual application load.
Key capabilities
- Event‑driven scaling: Scale workloads down to zero when there is no traffic and automatically scale back up as events arrive.
- No changes to the application: The workload remains unchanged; KEDA acts as a clean extension to Kubernetes autoscaling.
- Native integration: KEDA uses a
ScaledObjectcustom resource, registered through the Kubernetes API server, to define which workload should scale and which external trigger should drive that scaling.
How KEDA Works
- ScaledObject definition – A custom resource that specifies the target workload and the external event source.
- Scaler – KEDA continuously observes the external trigger (e.g., a message queue, HTTP traffic, or a monitoring system) using specialized scalers.
- Metrics Adapter – When events are detected, KEDA’s controller evaluates the demand and exposes the corresponding metric through its metrics adapter.
- HPA consumption – The HPA consumes these metrics and remains the sole component responsible for scaling pods.
- Scale‑to‑zero – If no events are present, KEDA allows the workload to scale down to zero; when events reappear, pods are scaled back up.
Proof of Concept at KCD Guatemala 2025
During KCD Guatemala 2025, a PoC demonstrated automatic scaling of a Kubernetes workload based on real HTTP traffic using KEDA. The demo included:
- A lightweight sample application.
- KEDA HTTP add‑on configuration.
ScaledObjectdefinitions.- Kubernetes manifests showing event‑driven autoscaling from zero to multiple replicas and back down when traffic stops.
- Scripts and instructions to generate load and observe scaling behavior in real time.
The goal was to provide a clear, hands‑on example of event‑based autoscaling rather than a production‑ready system.
Resources
- Complete setup, source code, and documentation: GitHub – keda-demo-kcd