使用 Nginx 自动故障转移构建蓝/绿部署
Source: Dev.to
介绍
Blue/Green 部署让你运行两个相同的应用实例(Blue 和 Green),并在出现问题时瞬间将流量从活动实例切换到备用实例。在本指南中,我们使用 Nginx 作为流量调度器,使用一个小型 Node.js 服务提供两个池,并可选地使用一个 Python 监视器读取 Nginx 的 JSON 日志并向 Slack 发送警报,构建一个完整的基于容器的 Blue/Green 环境。
前提条件
| 项目 | 原因 |
|---|---|
| Docker + Docker Compose | 在本地运行服务,无需 Kubernetes |
| Node.js(用于构建应用) | 编译演示服务 |
| (可选)Slack webhook URL | 接收故障切换警报 |
| 终端和文本编辑器 | 创建并编辑文件 |
项目结构
.
├─ app/
│ ├─ package.json
│ ├─ app.js
│ └─ Dockerfile
├─ nginx/
│ └─ nginx.conf.template
├─ watcher/
│ ├─ requirements.txt
│ └─ watcher.py
├─ docker-compose.yaml
└─ .env
1. Node.js 应用
package.json
{
"name": "blue-green-app",
"version": "1.0.0",
"main": "app.js",
"license": "MIT",
"scripts": {
"start": "node app.js"
},
"dependencies": {
"express": "^4.18.2"
}
}
app.js
const express = require('express');
const app = express();
const APP_POOL = process.env.APP_POOL || 'unknown';
const RELEASE_ID = process.env.RELEASE_ID || 'unknown';
const PORT = process.env.PORT || 3000;
let chaosMode = false;
let chaosType = 'error'; // 'error' or 'timeout'
// Add tracing headers
app.use((req, res, next) => {
res.setHeader('X-App-Pool', APP_POOL);
res.setHeader('X-Release-Id', RELEASE_ID);
next();
});
app.get('/', (req, res) => {
res.json({
service: 'Blue/Green Demo',
pool: APP_POOL,
releaseId: RELEASE_ID,
status: chaosMode ? 'chaos' : 'healthy',
chaosMode,
chaosType: chaosMode ? chaosType : null,
timestamp: new Date().toISOString(),
endpoints: {
version: '/version',
health: '/healthz',
chaos: '/chaos/start, /chaos/stop'
}
});
});
app.get('/healthz', (req, res) => {
res.status(200).json({ status: 'healthy', pool: APP_POOL });
});
app.get('/version', (req, res) => {
if (chaosMode && chaosType === 'error')
return res.status(500).json({ error: 'Chaos: server error' });
if (chaosMode && chaosType === 'timeout')
return; // simulate hang
res.json({
version: '1.0.0',
pool: APP_POOL,
releaseId: RELEASE_ID,
timestamp: new Date().toISOString()
});
});
app.post('/chaos/start', (req, res) => {
const mode = req.query.mode || 'error';
chaosMode = true;
chaosType = mode;
res.json({ message: 'Chaos started', mode, pool: APP_POOL });
});
app.post('/chaos/stop', (req, res) => {
chaosMode = false;
chaosType = 'error';
res.json({ message: 'Chaos stopped', pool: APP_POOL });
});
app.listen(PORT, '0.0.0.0', () => {
console.log(`App (${APP_POOL}) listening on ${PORT}`);
console.log(`Release ID: ${RELEASE_ID}`);
});
该服务提供:
GET /healthz– Nginx 用的健康检查GET /version– 返回版本信息;可通过 chaos 模式强制返回错误或超时POST /chaos/start?mode=error|timeout– 启用故障模拟POST /chaos/stop– 关闭 chaos
2. 两个池的 Docker 镜像
Dockerfile
FROM node:18-alpine
WORKDIR /app
# Install production dependencies
COPY package*.json ./
RUN npm install --only=production
# Copy source code
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
Blue 和 Green 容器都基于此镜像构建;它们仅通过环境变量(APP_POOL、RELEASE_ID 等)来区分。
3. Nginx 流量调度器
nginx/nginx.conf.template
events {
worker_connections 1024;
}
http {
# Structured JSON access logs
log_format custom_json '{"time":"$time_iso8601"'
',"remote_addr":"$remote_addr"'
',"method":"$request_method"'
',"uri":"$request_uri"'
',"status":$status'
',"bytes_sent":$bytes_sent'
',"request_time":$request_time'
',"upstream_response_time":"$upstream_response_time"'
',"upstream_status":"$upstream_status"'
',"upstream_addr":"$upstream_addr"'
',"pool":"$sent_http_x_app_pool"'
',"release":"$sent_http_x_release_id"}';
upstream blue_pool {
server app-blue:3000 max_fails=1 fail_timeout=3s;
server app-green:3000 backup;
}
upstream green_pool {
server app-green:3000 max_fails=1 fail_timeout=3s;
server app-blue:3000 backup;
}
server {
listen 80;
server_name localhost;
# JSON access log (shared volume)
access_log /var/log/nginx/access.json custom_json;
# Simple health endpoint for the load balancer itself
location /healthz {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
location / {
# $UPSTREAM_POOL is set by Docker‑Compose env substitution
proxy_pass http://$UPSTREAM_POOL;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Fast timeouts → quick failover
proxy_connect_timeout 1s;
proxy_send_timeout 3s;
proxy_read_timeout 3s;
# Retry on errors / timeouts, try backup upstream
proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
proxy_next_upstream_tries 2;
proxy_next_upstream_timeout 10s;
proxy_pass_request_headers on;
proxy_hide_header X-Powered-By;
}
}
}
关键设置
max_fails=1 fail_timeout=3s– 单次失败即将上游标记为下线短时间。- 较短的
proxy_*_timeout值可防止客户端在主池异常时长时间等待。 proxy_next_upstream配合重试会自动将请求路由到备份池。
4. 可选的 Slack 监视器
watcher/requirements.txt
requests==2.32.3
watcher/watcher.py
import json, os, time, requests
from collections import deque
from datetime import datetime, timezone
LOG_PATH = os.getenv("NGINX_LOG_FILE", "/var/log/nginx/access.json")
SLACK_WEBHOOK_URL = os.getenv("SLACK_WEBHOOK_URL", "")
SLACK_PREFIX = os.getenv("SLACK_PREFIX", "from: @Watcher")
ACTIVE_POOL = os.getenv("ACTIVE_POOL", "blue")
ERROR_RATE_THRESHOLD = float(os.getenv("ERROR_RATE_THRESHOLD", "2"))
WINDOW_SIZE = int(os.getenv("WINDOW_SIZE", "200"))
ALERT_COOLDOWN_SEC = int(os.getenv("ALERT_COOLDOWN_SEC", "300"))
MAINTENANCE_MODE = os.getenv("MAINTENANCE_MODE", "false").lower() == "true"
def now_iso():
return datetime.now(timezone.utc).isoformat()
def post_to_slack(message):
if not SLACK_WEBHOOK_URL:
return
payload = {"text": f"{SLACK_PREFIX} {message}"}
try:
requests.post(SLACK_WEBHOOK_URL, json=payload, timeout=5)
except Exception as e:
print(f"Slack post failed: {e}")
def parse_log_line(line):
try:
return json.loads(line)
except json.JSONDecodeError:
return None
def main():
recent = deque(maxlen=WINDOW_SIZE)
last_alert = 0
while True:
try:
with open(LOG_PATH, "r") as f:
# Seek to end and read new lines
f.seek(0, os.SEEK_END)
while True:
line = f.readline()
if not line:
time.sleep(0.5)
continue
entry = parse_log_line(line.strip())
if not entry:
continue
recent.append(entry)
# Detect failover: pool header changed from ACTIVE_POOL
if entry.get("pool") and entry["pool"] != ACTIVE_POOL:
now = time.time()
if now - last_alert > ALERT_COOLDOWN_SEC:
msg = f"Failover detected! Traffic switched from {ACTIVE_POOL} to {entry['pool']}"
post_to_slack(msg)
print(now_iso(), msg)
last_alert = now
except FileNotFoundError:
time.sleep(1)
except Exception as e:
print(f"Watcher error: {e}")
time.sleep(2)
if __name__ == "__main__":
main()
监视器会实时读取 JSON 日志,计算简单的错误率窗口,并在检测到流量切换到非活动池或错误率超过阈值时向 Slack 发送警报。
5. 环境变量 (.env)
# Choose which pool is primary (blue or green)
ACTIVE_POOL=blue
# Labels for the two app containers
APP_BLUE_POOL=blue
APP_GREEN_POOL=green
# Release identifiers (optional, useful for tracing)
RELEASE_ID_BLUE=2025-12-09-blue
RELEASE_ID_GREEN=2025-12-09-green
# Nginx upstream selector – will be substituted in the template
UPSTREAM_POOL=${ACTIVE_POOL}_pool
# Watcher settings (adjust as needed)
ERROR_RATE_THRESHOLD=2
WINDOW_SIZE=200
ALERT_COOLDOWN_SEC=300
# Slack webhook (leave empty to disable alerts)
SLACK_WEBHOOK_URL=
当 ACTIVE_POOL=blue 时,Nginx 模板会将 UPSTREAM_POOL 解析为 blue_pool,从而使 Blue 服务成为主上游。
6. Docker Compose 文件
version: "3.9"
services:
app-blue:
build: ./app
environment:
- APP_POOL=${APP_BLUE_POOL}
- RELEASE_ID=${RELEASE_ID_BLUE}
ports: [] # not exposed directly
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/healthz"]
interval: 5s
timeout: 2s
retries: 2
app-green:
build: ./app
environment:
- APP_POOL=${APP_GREEN_POOL}
- RELEASE_ID=${RELEASE_ID_GREEN}
ports: [] # not exposed directly
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/healthz"]
interval: 5s
timeout: 2s
retries: 2
nginx:
image: nginx:1.25-alpine
depends_on:
- app-blue
- app-green
ports:
- "8080:80"
volumes:
- ./nginx/nginx.conf.template:/etc/nginx/nginx.conf.template:ro
- ./nginx/log:/var/log/nginx
environment:
- ACTIVE_POOL=${ACTIVE_POOL}
- UPSTREAM_POOL=${UPSTREAM_POOL}
command: /bin/sh -c "envsubst '\$UPSTREAM_POOL' /etc/nginx/nginx.conf && nginx -g 'daemon off;'"
watcher:
build:
context: ./watcher
depends_on:
- nginx
volumes:
- ./nginx/log:/var/log/nginx
environment:
- ACTIVE_POOL=${ACTIVE_POOL}
- SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL}
- ERROR_RATE_THRESHOLD=${ERROR_RATE_THRESHOLD}
- WINDOW_SIZE=${WINDOW_SIZE}
- ALERT_COOLDOWN_SEC=${ALERT_COOLDOWN_SEC}
# Remove this service if you don't need Slack alerts
nginx 服务在启动前使用 envsubst 替换模板中的 $UPSTREAM_POOL。
7. 运行演示
# Start everything
docker compose --env-file .env up -d
# Verify Nginx health endpoint
curl http://localhost:8080/healthz
# → should return "healthy"
# Call the application through the load balancer
curl http://localhost:8080/
你应该会看到包含 X-App-Pool 头信息的 JSON 响应(默认是 Blue)。
模拟故障
# Put the Blue app into chaos mode (force 500 errors)
curl -X POST "http://localhost:8080/chaos/start?mode=error"
# Or simulate a timeout
curl -X POST "http://localhost:8080/chaos/start?mode=timeout"
故障触发后,对 http://localhost:8080/ 的后续请求会自动由 Green 池提供服务,这得益于 Nginx 的 proxy_next_upstream 逻辑。若启用了监视器,它会向 Slack 发送故障切换的警报。
要停止 chaos:
curl -X POST "http://localhost:8080/chaos/stop"
8. 切换活动池
如果想在不触发故障切换的情况下让 Green 成为主池,只需编辑 .env:
ACTIVE_POOL=green
然后重新启动 Nginx(或整个堆栈),使模板重新生成:
docker compose up -d --no-deps --build nginx
此后新流量会优先路由到 Green,Blue 仍保持待机状态。
9. 清理
docker compose down -v
-v 参数会删除存放 Nginx 日志的匿名卷。
10. 你学到了什么
- 使用纯 Docker Compose 实现 Blue/Green 模式——无需 Kubernetes。
- Nginx upstream 配置中的
max_fails、fail_timeout与proxy_next_upstream实现即时故障切换。 - 结构化的 JSON 访问日志,通过自定义头部暴露上游池信息。
- 简单的 chaos 接口,用于测试系统弹性。
- 可选的 监视器,将日志