What I Learned at the CNCF Montreal KubeCon NA 2025 Recap

Published: 5 days ago (December 20, 2025 at 09:59 AM EST)

3 min read

Source: Dev.to

Introduction

On December 10th, the Cloud Native Montreal community hosted a recap of KubeCon NA 2025 in Atlanta. Rather than a traditional conference, the event was community‑driven, featuring lightning talks and reflections on where the cloud‑native ecosystem is heading. The focus was on emerging patterns and lessons across the ecosystem—from AI agents and observability to GitOps and energy‑aware infrastructure.

AI Workloads as First‑Class Citizens

A recurring theme was that AI workloads are now first‑class citizens in cloud‑native environments. Traditional observability answers questions such as:

Is the service up?
Is latency within SLOs?

AI systems introduce new operational questions:

What prompt triggered this behavior?
Which model call was expensive?
Why did this agent take a specific action?

Tools such as OpenLLMetry extend OpenTelemetry with instrumentation for LLM and agent workflows, while OpenCost provides visibility into Kubernetes and cloud spend across workloads, teams, and environments.

Takeaway:
You can’t scale AI systems you can’t observe or financially understand. Observability is evolving beyond dashboards and alerts toward agent‑assisted operations.

Evolving Observability

Instead of engineers manually correlating metrics, logs, and recent deployments, emerging tools aim to:

Perform root‑cause analysis
Triage alerts
Recommend remediation steps

Projects like k8sgpt, Seraph, and newer agentic SRE tools suggest a future where observability systems don’t just surface data—they actively reason over it.

Highlighted Tools

k8sgpt – AI‑native Kubernetes troubleshooting
HolmesGPT / Seraph – Automated root‑cause analysis and alert mitigation

Emerging Agent‑Based Platforms

AWS DevOps Agent (Preview)
Azure SRE Agent (Preview)
Cleric
Kube Whisperer

These agents correlate logs, metrics, deployments, and incidents to assist on‑call engineers and reduce alert fatigue. They don’t replace engineers but shift the workflow: less time searching for signals, more time making informed decisions.

Cyclops: Structured Abstractions for Kubernetes

Cyclops is an open‑source platform that simplifies Kubernetes by replacing raw YAML with structured, form‑based abstractions.

Core Concepts

Modules – Logical groupings of all Kubernetes resources an application needs
Templates – Mappings that translate module inputs into valid Kubernetes manifests

Integration with Helm

Helm charts define resources (Deployments, Services, Ingress, etc.) using templated YAML.
Cyclops wraps those charts and exposes their values as validated forms instead of free‑text YAML edits.
Users fill in forms, and Cyclops renders the underlying Helm templates into valid manifests.

Cyclops also supports AI‑driven operations through a Model Context Protocol (MCP) server, allowing agents to manage applications using natural language rather than direct cluster access.

Caution: Code generated by AI should be treated as untrusted. Security risks still apply, and as abstraction increases, guardrails, validation, and testing become even more critical.

GitOps Case Study

A practical GitOps case study highlighted that repository structure matters as much as tooling. Key principles discussed:

Align configuration structure with team ownership
Centralize configuration while keeping environments explicit
Keep related files close together (“proximity matters”)
Optimize for developer experience, not just correctness

Using ArgoCD, deployments become automated, auditable, and consistent—provided GitOps is treated as both a technical and organizational design.

Kepler: Energy‑Aware Observability

Kepler, a CNCF project, exposes energy consumption at the container level.

Features

Fine‑grained container and process power metrics
Support for CPUs, GPUs, and heterogeneous hardware
Low overhead using eBPF
Integration with existing observability stacks

As GPU‑heavy and AI workloads grow, energy usage and cooling costs become operational concerns.

Key Message: Sustainability is now part of platform engineering, not just hardware planning.

Summary

This KubeCon recap wasn’t about memorizing tools—it was about understanding direction. Across talks, a consistent shift emerged:

From reactive monitoring to AI‑assisted operations
From raw YAML to safe, opinionated abstractions
From cost surprises to cost‑aware platforms
From performance‑only metrics to energy‑aware infrastructure

Community‑driven events like this help connect individual technologies into a cohesive mental model of where cloud‑native systems are heading next.