如何有效监控定期备份
Source: Dev.to
场景
想象一下这个情景:你编写了一个 Bash 脚本来备份生产数据库(例如,一个在线商店)。将脚本加入 crontab 后,一切运行顺利。一个月后,数据库被损坏——可能是因为安装了有缺陷的插件。当你尝试从“最新”备份恢复时,发现最近的备份已经是两周前的了。
发生了什么?
备份脚本悄悄停止工作。这个噩梦比你想象的更常见,可能由以下原因导致:
- 磁盘已满
- 权限更改
- 网络超时
- 凭证过期
- 快速修复时引入的拼写错误
未监控备份的问题
传统的 cron 任务存在一个根本缺陷:它们只在未能运行时报告错误。备份脚本可能会:
- 在 cron 视角下仍然“成功”,但实际退出时带有错误
- 生成空文件或损坏的文件
- 耗时远超预期(这往往是潜在问题的信号)
- 因权限问题而跳过某些表
不知不觉中,保留期限就会到期,而你将没有任何可用的备份。
Source: …
监控备份脚本
解决方案很简单:让备份脚本主动向外部监控系统报告其状态。下面是一份将数据库备份与 CronMonitor 集成的指南。
MySQL / MariaDB 备份示例
#!/bin/bash
MONITOR_URL="https://cronmonitor.app/api/ping/your-unique-id"
BACKUP_DIR="/backups/mysql"
DATE=$(date +%Y%m%d_%H%M%S)
DB_NAME="production"
# Signal start
curl -s "${MONITOR_URL}/start"
# Perform backup
mysqldump --single-transaction \
--routines \
--triggers \
"$DB_NAME" | gzip > "${BACKUP_DIR}/${DB_NAME}_${DATE}.sql.gz"
# Verify backup succeeded and file is not empty
if [ $? -eq 0 ] && [ -s "${BACKUP_DIR}/${DB_NAME}_${DATE}.sql.gz" ]; then
# Keep only the last 7 days
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +7 -delete
# Signal success
curl -s "${MONITOR_URL}/complete"
else
# Signal failure
curl -s "${MONITOR_URL}/fail"
exit 1
fi
PostgreSQL 备份示例
#!/bin/bash
MONITOR_URL="https://cronmonitor.app/api/ping/your-unique-id"
BACKUP_DIR="/backups/postgres"
DATE=$(date +%Y%m%d_%H%M%S)
DB_NAME="production"
# Signal start
curl -s "${MONITOR_URL}/start"
# Perform backup (custom format for flexibility)
pg_dump -Fc "$DB_NAME" > "${BACKUP_DIR}/${DB_NAME}_${DATE}.dump"
# Verify backup succeeded and file is not empty
if [ $? -eq 0 ] && [ -s "${BACKUP_DIR}/${DB_NAME}_${DATE}.dump" ]; then
# Verify integrity
pg_restore --list "${BACKUP_DIR}/${DB_NAME}_${DATE}.dump" > /dev/null 2>&1
if [ $? -eq 0 ]; then
curl -s "${MONITOR_URL}/complete"
else
curl -s "${MONITOR_URL}/fail"
exit 1
fi
else
curl -s "${MONITOR_URL}/fail"
exit 1
fi
多数据库备份并报告大小
#!/bin/bash
MONITOR_URL="https://cronmonitor.app/api/ping/your-unique-id"
BACKUP_DIR="/backups"
DATE=$(date +%Y%m%d)
DATABASES="app_production analytics users"
# Signal start
curl -s "${MONITOR_URL}/start"
FAILED=0
TOTAL_SIZE=0
for DB in $DATABASES; do
mysqldump --single-transaction "$DB" | gzip > "${BACKUP_DIR}/${DB}_${DATE}.sql.gz"
if [ $? -ne 0 ] || [ ! -s "${BACKUP_DIR}/${DB}_${DATE}.sql.gz" ]; then
FAILED=1
echo "Backup failed for: $DB"
else
# Get file size (compatible with macOS and Linux)
SIZE=$(stat -f%z "${BACKUP_DIR}/${DB}_${DATE}.sql.gz" 2>/dev/null || stat -c%s "${BACKUP_DIR}/${DB}_${DATE}.sql.gz")
TOTAL_SIZE=$((TOTAL_SIZE + SIZE))
fi
done
if [ $FAILED -eq 0 ]; then
# Report success with total size metadata
curl -s "${MONITOR_URL}/complete?msg=Backed%20up%20${TOTAL_SIZE}%20bytes"
else
curl -s "${MONITOR_URL}/fail"
exit 1
fi
如何为备份任务配置 CronMonitor
- 在 CronMonitor 中创建一个新的监控,用于“backup”任务。
- 设置预期的计划(例如,每天凌晨 02:00)。
- 配置宽限期——足够长以覆盖最大预期备份时间。
- 设置警报(电子邮件、Slack、Discord 等)。
关键设置
- 计划 – 必须与服务器的 cron 计划完全匹配。
- 宽限期 – 要比最长的预期备份时间更长。
最佳实践
-
验证,而不是假设
始终检查备份文件是否存在 且 包含数据。从 shell 的角度来看,一个空的 gzipped 文件仍然是“成功”的命令。 -
定期测试恢复
备份的价值取决于能否恢复它们。安排定期的恢复测试,并同样监控这些作业。 -
监控备份时长
CronMonitor 记录每个作业所用的时间。突然的增加通常表明数据量增长或性能问题。 -
将备份存放在异地
包含将备份同步到外部驱动器或服务器的步骤。# After local backup succeeds rsync -az "${BACKUP_DIR}/" remote:/backups/ && \ curl -s "${MONITOR_URL}/complete" || \ curl -s "${MONITOR_URL}/fail" -
记录恢复流程
如果触发警报(例如,Slack 通知备份失败),需要清晰的逐步说明,而不是调试会话。
Conclusion
Database backups are the last line of defense against data loss. They deserve more than a fire‑and‑forget cron job and the hope that everything will always run correctly. By adding active monitoring you get immediate visibility into problems, allowing you to act before it’s too late.
While you still have time to resolve it.
Start monitoring your backup scripts today. Your future self (the one who doesn’t have to explain data loss to a client) will thank you.
CronMonitor is a simple, developer‑friendly cron job monitoring service. Set up your first monitor in under a minute.