How to identify database warning signals and plan your zero downtime migration

Published: (May 11, 2026 at 03:17 AM EDT)
4 min read
Source: Dev.to

Source: Dev.to

Introduction

Database outages often occur at the worst possible moment, leaving you to deal with angry users, stressed stakeholders, and intense pressure to fix things immediately. The key to avoiding crises is to detect warning signs early and plan migrations on your own schedule.

Prerequisites

  • Database monitoring capabilities (built‑in tools are sufficient)
  • Admin access to your database servers
  • Understanding of your application’s typical database behavior
  • Ability to run queries and check system metrics

The examples below focus on MySQL and PostgreSQL, but the principles apply to most relational databases.

Warning Signals

Connection Saturation

“This kills applications faster than any slow query.”

MySQL

SHOW STATUS LIKE 'Threads_connected';
SHOW VARIABLES LIKE 'max_connections';

PostgreSQL

SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
SHOW max_connections;
  • Alert when connections reach 70 % of max_connections.
  • Danger zone at 80 %.

Rising Average Query Execution Time

Track the average execution time over weeks rather than focusing on isolated slow queries.

MySQL – Enable slow query logging

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1.0;   -- seconds

PostgreSQL – Query statistics

SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

A steady upward trend indicates growing data volume or degrading indexes.

Lock Contention

Locks can cause cascading slowdowns across the entire application.

MySQL

SELECT *
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE event_name LIKE '%lock%' AND count_star > 0;

PostgreSQL

SELECT mode, locktype, granted, COUNT(*)
FROM pg_locks
GROUP BY mode, locktype, granted;
  • Regular lock waits > 100 ms suggest table‑design issues.

Disk I/O Bottlenecks

# Monitor disk utilization (Linux)
iostat -x 1

Watch for:

  • %util consistently > 80 %
  • avgqu-sz > 2
  • await times > 20 ms

When these metrics stay high, plan a migration before an emergency forces you to act.

Migration Strategies

Blue‑Green Deployment (Databases ≤ 100 GB)

  1. Create a read replica

    -- MySQL
    CHANGE MASTER TO MASTER_HOST='source-db.example.com';
    START SLAVE;
  2. Monitor replication lag

    SHOW SLAVE STATUS\G

Logical Replication for Larger Databases

PostgreSQL Example

-- Source database
CREATE PUBLICATION migration_pub FOR ALL TABLES;

-- Target database
CREATE SUBSCRIPTION migration_sub
  CONNECTION 'host=source-db.example.com user=replicator dbname=production'
  PUBLICATION migration_pub;

Verification & Cut‑over Checklist

  1. Data Consistency Checks (example using a checksum)

    SELECT
      table_name,
      COUNT(*) AS row_count,
      COALESCE(SUM(CRC32(CONCAT_WS('|', col1, col2, col3))), 0) AS checksum
    FROM your_table
    GROUP BY table_name;
  2. Stop writes to the source and wait for replication lag to reach zero.

  3. Verify checksums match between source and target.

  4. Update application configuration to point to the new database.

  5. Redirect traffic and monitor:

    # Response‑time check
    curl -w "Total time: %{time_total}s\n" -o /dev/null -s https://your-app.com/health
    # Error‑rate monitoring
    grep "ERROR" /var/log/application.log | wc -l
  6. Post‑migration performance review

    SELECT
      query_digest,
      avg_timer_wait/1000000 AS avg_time_ms,
      count_star AS executions
    FROM performance_schema.events_statements_summary_by_digest
    ORDER BY avg_timer_wait DESC
    LIMIT 10;

    Performance should improve or remain equivalent; any degradation points to configuration issues.

Common Pitfalls

  • Ignoring replication lag – always confirm the replica is fully caught up before switching.
  • Connection‑pool mismatches – ensure the new environment can handle the same load.
  • Missing indexes – verify that all expected indexes exist and are used.
  • No rollback plan – maintain the ability to revert to the original database quickly.

Conclusion

Database problems are predictable when you measure the right metrics. Connection exhaustion, trending query slowdowns, lock contention, and storage bottlenecks give you weeks or months of warning before users notice. Implementing the monitoring practices and migration workflow described here helps you avoid emergency migrations, reduces cost, and keeps your service reliable.

0 views
Back to Blog

Related posts

Read more »