Red Hat Performance and Scale Engineering
Source: Red Hat Blog
Introduction
In my previous blog, How to set up KServe autoscaling for vLLM with KEDA, we explored the foundational setup of vLLM autoscaling in Open Data Hub (ODH) using KEDA and the custom metrics autoscaler operator. We established the architecture for a scaling strategy that goes beyond traditional CPU and memory metrics, using AI inference‑specific service‑level indicators (SLI). Now, it’s time to put this system to the test and validate its performance under realistic workloads.