How to identify database warning signals and plan your zero downtime migration
Source: Dev.to
Introduction
Database outages often occur at the worst possible moment, leaving you to deal with angry users, stressed stakeholders, and intense pressure to fix things immediately. The key to avoiding crises is to detect warning signs early and plan migrations on your own schedule.
Prerequisites
- Database monitoring capabilities (built‑in tools are sufficient)
- Admin access to your database servers
- Understanding of your application’s typical database behavior
- Ability to run queries and check system metrics
The examples below focus on MySQL and PostgreSQL, but the principles apply to most relational databases.
Warning Signals
Connection Saturation
“This kills applications faster than any slow query.”
MySQL
SHOW STATUS LIKE 'Threads_connected';
SHOW VARIABLES LIKE 'max_connections';
PostgreSQL
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
SHOW max_connections;
- Alert when connections reach 70 % of
max_connections. - Danger zone at 80 %.
Rising Average Query Execution Time
Track the average execution time over weeks rather than focusing on isolated slow queries.
MySQL – Enable slow query logging
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1.0; -- seconds
PostgreSQL – Query statistics
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
A steady upward trend indicates growing data volume or degrading indexes.
Lock Contention
Locks can cause cascading slowdowns across the entire application.
MySQL
SELECT *
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE event_name LIKE '%lock%' AND count_star > 0;
PostgreSQL
SELECT mode, locktype, granted, COUNT(*)
FROM pg_locks
GROUP BY mode, locktype, granted;
- Regular lock waits > 100 ms suggest table‑design issues.
Disk I/O Bottlenecks
# Monitor disk utilization (Linux)
iostat -x 1
Watch for:
%utilconsistently > 80 %avgqu-sz> 2awaittimes > 20 ms
When these metrics stay high, plan a migration before an emergency forces you to act.
Migration Strategies
Blue‑Green Deployment (Databases ≤ 100 GB)
-
Create a read replica
-- MySQL CHANGE MASTER TO MASTER_HOST='source-db.example.com'; START SLAVE; -
Monitor replication lag
SHOW SLAVE STATUS\G
Logical Replication for Larger Databases
PostgreSQL Example
-- Source database
CREATE PUBLICATION migration_pub FOR ALL TABLES;
-- Target database
CREATE SUBSCRIPTION migration_sub
CONNECTION 'host=source-db.example.com user=replicator dbname=production'
PUBLICATION migration_pub;
Verification & Cut‑over Checklist
-
Data Consistency Checks (example using a checksum)
SELECT table_name, COUNT(*) AS row_count, COALESCE(SUM(CRC32(CONCAT_WS('|', col1, col2, col3))), 0) AS checksum FROM your_table GROUP BY table_name; -
Stop writes to the source and wait for replication lag to reach zero.
-
Verify checksums match between source and target.
-
Update application configuration to point to the new database.
-
Redirect traffic and monitor:
# Response‑time check curl -w "Total time: %{time_total}s\n" -o /dev/null -s https://your-app.com/health# Error‑rate monitoring grep "ERROR" /var/log/application.log | wc -l -
Post‑migration performance review
SELECT query_digest, avg_timer_wait/1000000 AS avg_time_ms, count_star AS executions FROM performance_schema.events_statements_summary_by_digest ORDER BY avg_timer_wait DESC LIMIT 10;Performance should improve or remain equivalent; any degradation points to configuration issues.
Common Pitfalls
- Ignoring replication lag – always confirm the replica is fully caught up before switching.
- Connection‑pool mismatches – ensure the new environment can handle the same load.
- Missing indexes – verify that all expected indexes exist and are used.
- No rollback plan – maintain the ability to revert to the original database quickly.
Conclusion
Database problems are predictable when you measure the right metrics. Connection exhaustion, trending query slowdowns, lock contention, and storage bottlenecks give you weeks or months of warning before users notice. Implementing the monitoring practices and migration workflow described here helps you avoid emergency migrations, reduces cost, and keeps your service reliable.