The Missing Link: Triggering Serverless Events from Legacy Databases with AWS DMS
Source: Dev.to
We live in a world where we want everything to be event‑driven. A new user registration in our SQL database should immediately:
- trigger a welcome email via SES,
- update a CRM via API, and
- start a Step Functions workflow.
If you’re building greenfield on DynamoDB, this is easy (DynamoDB Streams). But what if your data lives in a legacy MySQL monolith, an on‑premise Oracle DB, or a standard PostgreSQL instance?
You need Change Data Capture (CDC)—you need to stream those changes to the cloud.
Naturally you look at AWS DMS (Database Migration Service). It’s perfect for moving data, but you quickly hit a wall:
The Problem
AWS DMS cannot target an AWS Lambda function directly.
You can’t simply configure a task that says “When a row is inserted in Table X, invoke Function Y”.
So, how do we bridge the gap between the “old world” (SQL) and the “new world” (serverless)? While many suggest Kinesis, the most robust and cost‑effective answer is Amazon S3.
Below is the architecture pattern I use to modernize legacy back‑ends without rewriting them.
The Architecture: The “S3 Drop” Pattern
- Source – DMS connects to your legacy database and captures changes (INSERT/UPDATE/DELETE) via the transaction logs.
- Target – DMS writes those changes as JSON files into an S3 bucket.
- Trigger – S3 detects the new file and fires an event notification.
- Compute – Your Lambda function receives the event, reads the file, and processes the business logic.

Why S3 Instead of Kinesis or Airbyte?
Why not Kinesis Data Streams?
- Cost – S3 is dramatically cheaper than a provisioned Kinesis stream, especially when the legacy DB is quiet.
- Observability – You can literally see the changes as files in your bucket, making debugging 10× easier.
- Batching – DMS writes to S3 in batches, naturally throttling Lambda invocations during massive write spikes.
Why not Airbyte or Fivetran?
- Those tools excel at ELT pipelines (e.g., loading data into Snowflake every 15–60 minutes).
- Our goal is event‑driven processing—trigger a Lambda as close to “real‑time” as possible.
- AWS DMS offers continuous CDC, delivering a granular stream of events that batch‑based ELT tools often miss.
- Staying 100 % AWS‑native simplifies IAM governance in strict enterprise environments.
Implementation Guide
DMS Endpoint Settings
When creating the target endpoint (S3) in DMS, don’t rely on defaults. Use the following Extra Connection Attributes so the output is Lambda‑friendly:
dataFormat=json;
datePartitionEnabled=true;
dataFormat=json– DMS defaults to CSV; JSON is far easier for Lambda to parse.datePartitionEnabled=true– Organizes files by date (/2023/11/02/...), preventing a single folder from containing millions of objects.
Understanding the Event Structure
A typical DMS‑generated file looks like this (Line‑Delimited JSON, also known as NDJSON):
{
"data": { "id": 101, "username": "jdoe", "status": "active" },
"metadata": { "operation": "insert", "timestamp": "2023-11-02T10:00:00Z" }
}
{
"data": { "id": 102, "username": "asmith", "status": "pending" },
"metadata": { "operation": "update", "timestamp": "2023-11-02T10:05:00Z" }
}
Each line contains the operation (insert, update, delete) and the payload (data) in a clean package.
Lambda Logic
Because DMS writes NDJSON, you cannot json.loads() the whole file at once. You must iterate line‑by‑line.
Below is a Python boilerplate that handles the file correctly:
import boto3
import json
s3 = boto3.client('s3')
def handler(event, context):
# 1️⃣ Extract bucket & key from the S3 event
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
print(f"Processing file: s3://{bucket}/{key}")
# 2️⃣ Retrieve the file generated by DMS
obj = s3.get_object(Bucket=bucket, Key=key)
content = obj['Body'].read().decode('utf-8')
# 3️⃣ Parse NDJSON (line‑delimited JSON)
for line in content.splitlines():
if not line.strip():
continue # skip empty lines
row = json.loads(line)
# 4️⃣ Filter / act on the operation type
operation = row.get('metadata', {}).get('operation')
if operation == 'insert':
user_data = row.get('data')
# TODO: add your business logic for inserts
print(f"INSERT: {user_data}")
elif operation == 'update':
user_data = row.get('data')
# TODO: add your business logic for updates
print(f"UPDATE: {user_data}")
elif operation == 'delete':
# Handle deletes if needed
print("DELETE operation received")
Key points
- Do not call
json.loads(content)on the whole file. - Iterate
content.splitlines()and parse each line individually. - Use the
metadata.operationfield to route your logic.
TL;DR
- Capture CDC from any legacy RDBMS with AWS DMS.
- Write the changes as JSON files to S3 (date‑partitioned).
- Trigger a Lambda via S3 event notifications.
- Parse the NDJSON payload line‑by‑line and implement your event‑driven business logic.
This “S3 Drop” pattern gives you a low‑cost, observable, and fully AWS‑native bridge between old‑school databases and modern serverless workflows. 🚀
print(f"New User Detected: {user_data['username']}")
# trigger_welcome_email(user_data)
elif operation == 'update':
print(f"User Updated: {row['data']['id']}")
Summary
You don’t need to refactor your entire legacy database to get the benefits of serverless. By using AWS DMS to unlock the data and S3 as a reliable buffer, you can trigger modern Lambda workflows from 20‑year‑old databases with minimal friction. This pattern prioritizes stability and observability over raw speed—a trade‑off that is usually worth it in enterprise migrations.