project canary strategy in deployment
Source: Dev.to
Illustrations
![]() | ![]() |
What this project demonstrates
- Canary deployment – what it really is
- How traffic is split between versions
- Why Canary exists (real‑world production reason)
What a DevOps engineer must watch carefully
- What can break in production if Canary is done wrong
1️⃣ What is Canary? (plain DevOps explanation)
Canary deployment = release a new version to a small % of users first.
Instead of:
- Replacing everything at once (Rolling)
- Running two full stacks (Blue/Green)
You:
- Keep v1 (stable) running.
- Add v2 (canary) with fewer replicas.
- Let Kubernetes naturally split traffic.
- If something goes wrong → delete the canary instantly.
- If all is good → scale the canary up, scale v1 down.
DevOps goal: reduce blast radius.
2️⃣ When DevOps uses Canary (real life)
Use Canary when a new feature touches:
- Payments, authentication, Kafka consumers
- Schema or config changes that are risky
- You need real user traffic, not just test traffic
Monitoring & rollback must be instant.
Do NOT use Canary when:
- The app is stateless and tiny
- No monitoring is in place
- No rollback process exists
- It’s a small internal tool
3️⃣ Canary Demo Project (what you will build)
You will deploy:
| Version | Response |
|---|---|
| v1 | VERSION 1 |
| v2 (canary) | VERSION 2 |
- One Service
- Traffic split via replica count
4️⃣ Step‑by‑step implementation (smallest YAML possible)
Step 1 – Stable version (v1)
File: deploy-v1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-v1
spec:
replicas: 3
selector:
matchLabels:
app: demo
version: v1
template:
metadata:
labels:
app: demo
version: v1
spec:
containers:
- name: app
image: hashicorp/http-echo
args:
- "-listen=:8080"
- "-text=VERSION 1"
ports:
- containerPort: 8080
Apply:
kubectl apply -f deploy-v1.yaml
Step 2 – Canary version (v2)
File: deploy-v2-canary.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-v2-canary
spec:
replicas: 1 # 🔑 THIS is the canary
selector:
matchLabels:
app: demo
version: v2
template:
metadata:
labels:
app: demo
version: v2
spec:
containers:
- name: app
image: hashicorp/http-echo
args:
- "-listen=:8080"
- "-text=VERSION 2 (CANARY)"
ports:
- containerPort: 8080
Apply:
kubectl apply -f deploy-v2-canary.yaml
Step 3 – Single Service (traffic split happens here)
File: service.yaml
apiVersion: v1
kind: Service
metadata:
name: demo-svc
spec:
selector:
app: demo
ports:
- port: 80
targetPort: 8080
Apply:
kubectl apply -f service.yaml
5️⃣ Live traffic observation (MOST IMPORTANT PART)
Expose the service:
minikube service demo-svc
Run continuous traffic against the exposed URL:
while true; do
date +"%H:%M:%S"
curl -s $URL
echo
sleep 0.3
done
What you will see
VERSION 1
VERSION 1
VERSION 1
VERSION 2 (CANARY)
VERSION 1
VERSION 1
That’s Canary in action.
6️⃣ How traffic splitting really works (DevOps reality)
Kubernetes does NOT split by percentage – it splits by number of Pods.
- v1 = 3 pods
- v2 = 1 pod
≈ 25 % of traffic goes to the canary.
7️⃣ DevOps responsibilities during Canary (the key part)
🔍 Metrics
- Error rate (5xx)
- Latency increase
- Pod restarts
- CPU / memory spikes
📜 Logs
- Application exceptions
- Kafka lag
- DB connection failures
🚦 Health
- Readiness probe failures
- CrashLoopBackOff
- Partial availability
8️⃣ Simulate failure (important demo)
Delete the canary immediately:
kubectl delete deployment app-v2-canary
2‑Canary
Traffic instantly returns to
VERSION 1
VERSION 1
VERSION 1
- 👉 No rollback needed
- 👉 No downtime
- 👉 This is why Canary exists
9️⃣ Promote Canary to full release
If everything looks good:
kubectl scale deployment app-v2-canary --replicas=3
kubectl scale deployment app-v1 --replicas=0
Result:
- v2 becomes main
- v1 is gone
- Users never noticed
🔟 Common Canary mistakes (real production issues)
- ❌ No monitoring
- ❌ Canary has no
readinessProbe - ❌ Same DB migration for v1 & v2
- ❌ Canary runs longer than needed
- ❌ No instant delete plan
Final DevOps takeaway (important)
Canary is not YAML.
Canary is risk management.
Your real job:
- Control exposure
- Detect failure early
- Kill bad releases fast
- Protect users & business

