Kubernetes rollouts: promote on SLOs, not on 'pods are Ready'

Published: 1 month ago (March 14, 2026 at 11:57 AM EDT)

2 min read

Source: Dev.to

Source: Dev.to

Readiness is a local signal. Production impact is global.

Pods can be Ready while your SLO window is already burning.

The failure chain

Everything looks healthy, but the error budget is draining quietly.

Ready only means the container started and passed its health check.
It says nothing about P95 latency, error rate, or whether your SLO slice is holding.
A canary can get stuck on “green” because metrics are too coarse.
No labels, no slices → blast radius stays invisible.

kubectl get hpa -o yaml

Wire an AnalysisRun in Argo Rollouts that checks error rate and P95 latency are within SLO bounds before promoting.
If the SLI is still recovering, promotion waits.

Promote only when the canary holds the SLO slice that matters for a fixed window.
Anything outside that window triggers an auto‑rollback.

Rollout speed and autoscaler reaction time are tuned independently. That gap is where the error budget burns before anyone pages.

Illustration of the failure chain and mitigation steps

What is the step interval on your rollouts right now?