What I Learned at the CNCF Montreal KubeCon NA 2025 Recap
Source: Dev.to
Introduction
On December 10th, the Cloud Native Montreal community hosted a recap of KubeCon NA 2025 in Atlanta. Rather than a traditional conference, the event was community‑driven, featuring lightning talks and reflections on where the cloud‑native ecosystem is heading. The focus was on emerging patterns and lessons across the ecosystem—from AI agents and observability to GitOps and energy‑aware infrastructure.
AI Workloads as First‑Class Citizens
A recurring theme was that AI workloads are now first‑class citizens in cloud‑native environments. Traditional observability answers questions such as:
- Is the service up?
- Is latency within SLOs?
AI systems introduce new operational questions:
- What prompt triggered this behavior?
- Which model call was expensive?
- Why did this agent take a specific action?
Tools such as OpenLLMetry extend OpenTelemetry with instrumentation for LLM and agent workflows, while OpenCost provides visibility into Kubernetes and cloud spend across workloads, teams, and environments.
Takeaway:
You can’t scale AI systems you can’t observe or financially understand. Observability is evolving beyond dashboards and alerts toward agent‑assisted operations.
Evolving Observability
Instead of engineers manually correlating metrics, logs, and recent deployments, emerging tools aim to:
- Perform root‑cause analysis
- Triage alerts
- Recommend remediation steps
Projects like k8sgpt, Seraph, and newer agentic SRE tools suggest a future where observability systems don’t just surface data—they actively reason over it.
Highlighted Tools
- k8sgpt – AI‑native Kubernetes troubleshooting
- HolmesGPT / Seraph – Automated root‑cause analysis and alert mitigation
Emerging Agent‑Based Platforms
- AWS DevOps Agent (Preview)
- Azure SRE Agent (Preview)
- Cleric
- Kube Whisperer
These agents correlate logs, metrics, deployments, and incidents to assist on‑call engineers and reduce alert fatigue. They don’t replace engineers but shift the workflow: less time searching for signals, more time making informed decisions.
Cyclops: Structured Abstractions for Kubernetes
Cyclops is an open‑source platform that simplifies Kubernetes by replacing raw YAML with structured, form‑based abstractions.
Core Concepts
- Modules – Logical groupings of all Kubernetes resources an application needs
- Templates – Mappings that translate module inputs into valid Kubernetes manifests
Integration with Helm
- Helm charts define resources (Deployments, Services, Ingress, etc.) using templated YAML.
- Cyclops wraps those charts and exposes their values as validated forms instead of free‑text YAML edits.
- Users fill in forms, and Cyclops renders the underlying Helm templates into valid manifests.
Cyclops also supports AI‑driven operations through a Model Context Protocol (MCP) server, allowing agents to manage applications using natural language rather than direct cluster access.
Caution: Code generated by AI should be treated as untrusted. Security risks still apply, and as abstraction increases, guardrails, validation, and testing become even more critical.
GitOps Case Study
A practical GitOps case study highlighted that repository structure matters as much as tooling. Key principles discussed:
- Align configuration structure with team ownership
- Centralize configuration while keeping environments explicit
- Keep related files close together (“proximity matters”)
- Optimize for developer experience, not just correctness
Using ArgoCD, deployments become automated, auditable, and consistent—provided GitOps is treated as both a technical and organizational design.
Kepler: Energy‑Aware Observability
Kepler, a CNCF project, exposes energy consumption at the container level.
Features
- Fine‑grained container and process power metrics
- Support for CPUs, GPUs, and heterogeneous hardware
- Low overhead using eBPF
- Integration with existing observability stacks
As GPU‑heavy and AI workloads grow, energy usage and cooling costs become operational concerns.
Key Message: Sustainability is now part of platform engineering, not just hardware planning.
Summary
This KubeCon recap wasn’t about memorizing tools—it was about understanding direction. Across talks, a consistent shift emerged:
- From reactive monitoring to AI‑assisted operations
- From raw YAML to safe, opinionated abstractions
- From cost surprises to cost‑aware platforms
- From performance‑only metrics to energy‑aware infrastructure
Community‑driven events like this help connect individual technologies into a cohesive mental model of where cloud‑native systems are heading next.