Kubernetes WG Serving concludes following successful advancement of AI inference support

Published: 3 days ago (February 26, 2026 at 08:30 AM EST)

2 min read

Source: CNCF Blog

Kubernetes logo

Overview

The Kubernetes Working Group (WG) Serving was created to support the development of the AI inference stack on Kubernetes, with the goal of making Kubernetes the orchestration platform of choice for inference workloads. This goal has been achieved, and the working group is now being disbanded.

Key Outcomes

Workstreams & Requirements – Collected requirements from model servers, hardware providers, and inference vendors, establishing a common understanding of inference workload specifics and trends.
Load Balancing & Workloads – Oversaw the adoption of the inference gateway as a request scheduler and helped standardize AI gateway functionality. Early participants seeded agent‑networking work in SIG Network.
Projects Initiated
- AIBrix – Now a CNCF‑hosted project; its design was informed by the WG’s use cases and problem statements.
- llm‑d – Addresses unresolved distributed‑inference challenges (benchmarking, best practices) and drives requirements to Kubernetes SIGs.
AI Conformance – Contributed to the Kubernetes AI Conformance profile; llm‑d is leveraging components such as Kueue, inference gateway, LWS, DRA, etc., to align recommendations with Kubernetes direction.

Future Directions

All ongoing efforts can be migrated to existing SIGs or other working groups:

Autoscaling & Fast Bootstrap – Discussed in SIG Node or SIG Scheduling.
Multi‑host / Multi‑node Work – Continued under SIG Apps (e.g., the LWS project).
Device Resource Allocation (DRA) – Handled by WG Device Management.
Orchestration Topics – Covered by SIG Scheduling and SIG Node.

Specific Projects and Sponsorship

Gateway API Inference Extension – Sponsored by SIG Network and will remain there.
Serving Catalog – Work can move to the Inference Perf project.
Inference Perf – Sponsored by SIG Scalability; ownership unchanged.

Acknowledgements

CNCF thanks all contributors who participated in WG Serving and helped advance Kubernetes as a platform for AI inference workloads.

Kubernetes WG Serving concludes following successful advancement of AI inference support

Overview

Key Outcomes

Future Directions

Specific Projects and Sponsorship

Acknowledgements

Related posts

KubeCon + CloudNativeCon Europe 2026 Co-located Event Deep Dive: BackstageCon

Ephemeral Storage in AKS — A Practical Hands-On Lab

vCluster (Virtual Clusters)

Terraform Explained: From 'What Even Is This?' to Actually Getting It 🚀