canary vs rolling update

Published: 1 week ago (January 6, 2026 at 01:53 PM EST)

3 min read

Source: Dev.to

Traffic Routing in Kubernetes

Kubernetes routes traffic by pod IPs, not by percentage.
A Service (virtual IP) forwards requests to any Ready pod; it has no knowledge of:

application versions
risk levels
traffic percentages

User
 ↓
Service (virtual IP)
 ↓
Random Pod IP

If you have 10 pods and a new pod (v2) is added, you now have 11 pods. The Service may send all requests from a single user to the same pod (connection reuse), so that user could hit v2 for all of their requests. Consequently, a single pod can receive anywhere from 0 % to 80 % of the traffic.

Kubernetes makes no promise about which pods receive new code or how many users are affected.

Rolling Updates

Rolling updates guarantee that:

Pods are not terminated all at once.
Overall capacity stays up.

They do not guarantee:

Which users get the new version.
How many users are affected.
That errors are limited.

Thus, rolling updates are availability‑safe, not user‑safe.

Example scenario

v1 works, v2 introduces a token‑validation bug.
One v2 pod starts.
The first real user hits that pod → login fails.
The user retries → the same pod (session stickiness) → user locked out.
A support ticket is opened, the deployment is rolled back, but damage is already done.

Even a single failure can be catastrophic for authentication or payment systems.

Canary Deployments

Canary deployments do not rely on randomness. You explicitly specify the traffic share for the new version, e.g.:

“Only 5 % of traffic goes to v2.”

Out of 1 000 requests:

950 → v1
50 → v2

This is enforced routing, not “maybe” or “if lucky”.

How it is implemented

Ingress rules with weighted paths
Load‑balancer weight settings
Service‑mesh traffic splitting

Replica‑based canary (not true percentage)

4 v1 pods
1 v2 pod

Using replicas alone only reduces probability; it does not guarantee traffic limits. Production environments therefore use explicit traffic weighting.

Benefits

If a v2 pod hits an external timeout (e.g., Stripe API), only the small canary traffic is affected.
Latency spikes are detected early; the canary can be stopped, keeping 99 % of customers safe.

Comparison: Rolling Update vs. Canary

Aspect	Rolling Update	Canary
Who gets new version?	Anyone (random)	Only selected traffic (controlled)
Traffic share	Random distribution	Controlled percentage
Risk size	Unknown	Known & limited
Rollback damage	Already happened (may be large)	Minimal (limited exposure)
Typical use case	Safe, low‑risk changes	Risky or high‑impact changes

Readiness Probes

A readiness probe answers a single question:

“Should Kubernetes send traffic to this pod?”

It checks only technical availability (process alive, port open, HTTP 200, etc.). It does not verify business logic, latency, external dependencies, or correctness.

Typical probe definitions

readinessProbe:
  httpGet:
    path: /health
    port: 8080
# or
readinessProbe:
  tcpSocket:
    port: 8080

Probe passes → pod is marked Ready and added to the Service.
Probe fails → pod is removed from the Service.

A pod can return 200 OK on /health while its core functionality (e.g., payments, Kafka consumption) is broken.

How a Canary Deployment Works

Readiness – Pod passes the readiness probe → becomes eligible for traffic.
Canary traffic (e.g., 5 %) – Requests are routed to the new version.
Monitoring – Real‑user behavior, latency, error rates, and external system responses are observed.
Decision
- If metrics are healthy → increase traffic share.
- If errors exceed thresholds → stop the canary, keep the stable version.

Without a canary, 100 % of traffic would be exposed to any defect.

Interview‑style Summary

Readiness probes verify that a pod is technically ready to receive traffic.
Canary deployments validate that the new version is safe for users by exposing a controlled slice of real traffic and observing behavior.

Both mechanisms are complementary:

Readiness protects the Kubernetes control plane from sending traffic to a pod that isn’t running.
Canary protects the business and users from functional regressions.

Think of it as:

Readiness = “Is the engine started?”
Canary = “Is the car safe to drive at highway speeds?”