Kubernetes WG Serving concludes following successful advancement of AI inference support

Published: (February 26, 2026 at 08:30 AM EST)
2 min read
Source: CNCF Blog

Source: CNCF Blog

Kubernetes logo

Overview

The Kubernetes Working Group (WG) Serving was created to support the development of the AI inference stack on Kubernetes, with the goal of making Kubernetes the orchestration platform of choice for inference workloads. This goal has been achieved, and the working group is now being disbanded.

Key Outcomes

  • Workstreams & Requirements – Collected requirements from model servers, hardware providers, and inference vendors, establishing a common understanding of inference workload specifics and trends.
  • Load Balancing & Workloads – Oversaw the adoption of the inference gateway as a request scheduler and helped standardize AI gateway functionality. Early participants seeded agent‑networking work in SIG Network.
  • Projects Initiated
    • AIBrix – Now a CNCF‑hosted project; its design was informed by the WG’s use cases and problem statements.
    • llm‑d – Addresses unresolved distributed‑inference challenges (benchmarking, best practices) and drives requirements to Kubernetes SIGs.
  • AI Conformance – Contributed to the Kubernetes AI Conformance profile; llm‑d is leveraging components such as Kueue, inference gateway, LWS, DRA, etc., to align recommendations with Kubernetes direction.

Future Directions

All ongoing efforts can be migrated to existing SIGs or other working groups:

  • Autoscaling & Fast Bootstrap – Discussed in SIG Node or SIG Scheduling.
  • Multi‑host / Multi‑node Work – Continued under SIG Apps (e.g., the LWS project).
  • Device Resource Allocation (DRA) – Handled by WG Device Management.
  • Orchestration Topics – Covered by SIG Scheduling and SIG Node.

Specific Projects and Sponsorship

  • Gateway API Inference Extension – Sponsored by SIG Network and will remain there.
  • Serving Catalog – Work can move to the Inference Perf project.
  • Inference Perf – Sponsored by SIG Scalability; ownership unchanged.

Acknowledgements

CNCF thanks all contributors who participated in WG Serving and helped advance Kubernetes as a platform for AI inference workloads.

0 views
Back to Blog

Related posts

Read more »

vCluster (Virtual Clusters)

Taming the Kubernetes Beast: Your Guide to vCluster Virtual Clusters Ever felt like you're juggling a thousand flaming chainsaws when it comes to managing Kube...