Kubernetes WG Serving concludes following successful advancement of AI inference support
Source: CNCF Blog
Overview
The Kubernetes Working Group (WG) Serving was created to support the development of the AI inference stack on Kubernetes, with the goal of making Kubernetes the orchestration platform of choice for inference workloads. This goal has been achieved, and the working group is now being disbanded.
Key Outcomes
- Workstreams & Requirements – Collected requirements from model servers, hardware providers, and inference vendors, establishing a common understanding of inference workload specifics and trends.
- Load Balancing & Workloads – Oversaw the adoption of the inference gateway as a request scheduler and helped standardize AI gateway functionality. Early participants seeded agent‑networking work in SIG Network.
- Projects Initiated
- AI Conformance – Contributed to the Kubernetes AI Conformance profile; llm‑d is leveraging components such as Kueue, inference gateway, LWS, DRA, etc., to align recommendations with Kubernetes direction.
Future Directions
All ongoing efforts can be migrated to existing SIGs or other working groups:
- Autoscaling & Fast Bootstrap – Discussed in SIG Node or SIG Scheduling.
- Multi‑host / Multi‑node Work – Continued under SIG Apps (e.g., the LWS project).
- Device Resource Allocation (DRA) – Handled by WG Device Management.
- Orchestration Topics – Covered by SIG Scheduling and SIG Node.
Specific Projects and Sponsorship
- Gateway API Inference Extension – Sponsored by SIG Network and will remain there.
- Serving Catalog – Work can move to the Inference Perf project.
- Inference Perf – Sponsored by SIG Scalability; ownership unchanged.
Acknowledgements
CNCF thanks all contributors who participated in WG Serving and helped advance Kubernetes as a platform for AI inference workloads.