Kubernetes rollouts: promote on SLOs, not on 'pods are Ready'

Published: (March 14, 2026 at 11:57 AM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Readiness is a local signal. Production impact is global.

Pods can be Ready while your SLO window is already burning.

The failure chain

  • Rollout shifts traffic fast.
  • New pods saturate before HPA reacts.
  • HPA scrape window is 15–30 seconds minimum.
  • P95 latency climbs.
  • Error rate ticks up.
  • SLI degrades.

Everything looks healthy, but the error budget is draining quietly.

Why “pods are Ready” lies to you

  • Ready only means the container started and passed its health check.
  • It says nothing about P95 latency, error rate, or whether your SLO slice is holding.
  • A canary can get stuck on “green” because metrics are too coarse.
  • No labels, no slices → blast radius stays invisible.

Three resolvers

1. Pre‑scale before the first canary step

  • Bump replicas before traffic shifts.
  • HPA catches up from a safe baseline instead of a saturated one.

2. Match step interval to your HPA scale‑up window

  • Default stabilization window is 3 minutes.
  • Check yours with:
kubectl get hpa -o yaml
  • Promoting before that window closes is promoting blind.

3. Gate steps on SLI health

  • Wire an AnalysisRun in Argo Rollouts that checks error rate and P95 latency are within SLO bounds before promoting.
  • If the SLI is still recovering, promotion waits.

The rule

Promote only when the canary holds the SLO slice that matters for a fixed window.
Anything outside that window triggers an auto‑rollback.

Rollout speed and autoscaler reaction time are tuned independently. That gap is where the error budget burns before anyone pages.

Illustration of the failure chain and mitigation steps

Deep dive

What is the step interval on your rollouts right now?

0 views
Back to Blog

Related posts

Read more »

Travigo

Travel as fast as you speak with Gemini! Where live agents meet immersive storytelling & 3D navigation. This project was created for entering the Gemini Live Ag...

Micro games

Hey Gamers! 👾 As part of the Rapid Games Prototyping module, we are tasked with reviewing a peer's game. The challenge is to analyse a prototype built in just...