๐ Real-Time Data Replication Using MySQL, Debezium, Kafka, and Docker (CDC Guide)
Source: Dev.to
๐ Introduction
Modern applications often need data to move between systems in real time โ analytics platforms, microservices, search indexes, or backup databases.
Traditional approaches like batch jobs or cronโbased sync introduce delays, inconsistencies, and operational complexity.
This is where Change Data Capture (CDC) becomes powerful.
In this article, weโll build a simple but powerful realโtime database replication pipeline using:
- MySQL
- Debezium
- Apache Kafka
- Kafka Connect (JDBC Sink)
- Docker Compose
By the end, youโll have a working system that automatically replicates inserts, updates, and deletes from one database to another.
๐ฅ What is Change Data Capture (CDC)?
Change Data Capture is a technique used to capture database changes (INSERT, UPDATE, DELETE) and stream them to other systems in real time.
Instead of polling the database repeatedly, CDC reads the database transaction log (binlog in MySQL).
This makes CDC:
- โ Realโtime
- โ Efficient
- โ Reliable
- โ Scalable
โ Why Do We Need CDC?
Common use cases
- Realโtime analytics pipelines
- Microservices data synchronization
- Data warehousing
- Cache invalidation
- Eventโdriven architectures
- Search indexing (Elasticsearch)
- Zeroโdowntime migrations
Without CDC
App โ DB โ Batch Job โ Other Systems
With CDC
App โ DB โ CDC Stream โ Multiple Systems
Much faster and cleaner.
๐ Architecture Overview
We will build the following pipeline:
MySQL Source
โ (binlog)
Debezium Connector
โ
Kafka Topic
โ
JDBC Sink Connector
โ
MySQL Target
๐ณ Step 1 โ Docker Compose Setup
Weโll run everything using Docker so the setup is easy and reproducible.
Create a file called docker-compose.yml:
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:7.5.0
depends_on:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
mysql-source:
image: mysql:8.0
environment:
MYSQL_ROOT_PASSWORD: root
MYSQL_DATABASE: company
command: >
--server-id=1
--log-bin=mysql-bin
--binlog-format=ROW
ports:
- "3306:3306"
mysql-target:
image: mysql:8.0
environment:
MYSQL_ROOT_PASSWORD: root
ports:
- "3307:3306"
connect:
image: debezium/connect:2.5
depends_on:
- kafka
- mysql-source
ports:
- "8083:8083"
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: 1
CONFIG_STORAGE_TOPIC: connect_configs
OFFSET_STORAGE_TOPIC: connect_offsets
STATUS_STORAGE_TOPIC: connect_status
Start everything:
docker compose up -d
Give it ~30โฏseconds for services to fully start.
๐ Step 2 โ Create Database and Table (Source)
Because MySQL is running inside Docker, we execute commands using Docker.
docker compose exec -T mysql-source mysql -uroot -proot ..`).
---
### ๐ Production Best Practices
- Use **Schema Registry** for schema evolution.
- Enable **monitoring & alerting** (Prometheus, Grafana).
- Configure a **DeadโLetter Queue (DLQ)** for problematic records.
- Use migration tools (Flyway, Liquibase) for schema changes.
- Secure credentials (Docker secrets, Vault, or environment variables).
---
### โ
Conclusion
Change Data Capture (CDC) enables realโtime data movement with minimal overhead.
Debezium + Kafka Connect provides:
- Scalability
- Reliability
- Low latency
- Eventโdriven architecture
This pattern is widely used in modern distributed systems.
---
### ๐ Final Thoughts
If you work in:
- Data Engineering
- DevOps
- Backend Systems
- Platform Engineering
learning CDC is extremely valuable.
โญ If you enjoyed this article, feel free to connect and share your feedback!