Building a Config Drift Detector for AWS (with Snapshots, Lambdas, and a Next.js Dashboard)

Published: 1 day ago (January 18, 2026 at 11:14 PM EST)

5 min read

Source: Dev.to

Source: Dev.to

Cover image for Building a Config Drift Detector for AWS (with Snapshots, Lambdas, and a Next.js Dashboard)

High‑level architecture

Here’s the architecture diagram used in this article:

At a glance

AWS Services (e.g., EC2, Security Groups) are sampled on a schedule.
A Snapshot Lambda writes raw JSON snapshots to S3 and Supabase/PostgreSQL.
A Detect Lambda compares the latest snapshot to the previous baseline to detect drift.
An Alert Lambda writes drift events, updates baselines, and optionally sends Slack alerts.
A Next.js dashboard polls a lightweight API backed by Supabase/PostgreSQL to show drifts and baselines.

The rest of the article breaks this down from the perspective of an SRE/DevOps engineer who wants fast feedback, clear audit trails, and a UI that doesn’t feel like a side project.

Design goals and constraints

When I scoped this project, I set a few explicit goals:

Detect meaningful drift, not every single field that changes.
Keep the architecture boring and observable: managed services over bespoke infra.
Make the UI operator‑friendly: think SRE console, not toy dashboard.
Be small enough to build solo, but credible enough to show to senior engineers or hiring managers.

From there, the architecture fell naturally into four pieces:

Snapshot pipeline.
Drift detection engine.
Alerting and audit trail.
Web dashboard.

1. Snapshot pipeline

What gets snapshotted?

To start, I focused on a narrow but high‑impact slice of AWS resources:

EC2 instances – lifecycle, instance type, tags.
Security groups – inbound/outbound rules and attached resources.

These are common sources of “quick fixes” and “just for debugging” changes that later turn into security and reliability problems.

How snapshots flow through the system

The snapshot pipeline revolves around a scheduled Lambda:

Trigger: EventBridge rule runs every 30 minutes.
Snapshot Lambda:
1. Calls AWS APIs to list EC2 instances and security groups.
2. Normalizes the data into a stable JSON shape.
3. Writes each snapshot to:
  - S3 – raw, timestamped JSON (e.g., YYYY-MM-DD/HH-MM-SS.json).
  - Supabase/PostgreSQL – summarized snapshot metadata for faster queries later.

This gives you:

A cheap, append‑only log of the world as it looked at each point in time (S3).
A queryable state for dashboards and drift detection (Postgres).

2. Drift detection engine

Baselines vs. snapshots

The system uses a simple mental model:

A snapshot is “what the world looks like now”.
A baseline is “what we expect the world to look like”.

Every time a new snapshot arrives, the Detect Lambda compares it to the current baseline:

Map each resource by a stable identifier (e.g., instance ID).
Compare only the fields that matter for reliability/security.
Ignore noisy, fast‑changing fields (e.g., timestamps).

The output is a set of drift events:

Type	Meaning
ADDED	Resource exists in snapshot but not in baseline.
REMOVED	Resource exists in baseline but not in snapshot.
MODIFIED	Resource exists in both, but relevant fields differ.

Each drift event carries:

Resource metadata (ID, type, environment).
Which fields changed (before vs. after).
A severity classification (see below).

After detection, the baseline is updated forward so the system tracks drift incrementally rather than replaying from the beginning every time.

3. Alerting and severity

Not all drift is created equal. Changing a tag is not the same as opening SSH to the world.

Severity levels

Severity	Description
CRITICAL	Security‑group changes that materially expand exposure (e.g., `0.0.0.0/0` on sensitive ports).
HIGH	EC2 changes that alter lifecycle or network placement in risky ways.
MEDIUM	Configuration changes that might affect behavior but aren’t obviously dangerous.
LOW	Tag‑only changes and other low‑risk metadata updates.

Alert Lambda responsibilities

Write drift events into Supabase/PostgreSQL for later querying.
Update baselines as needed.
Optionally send Slack alerts for CRITICAL and HIGH severity events.

Slack notifications for HIGH and CRITICAL drifts

Channel: e.g., #infra-alerts
Message includes: resource, environment, severity, and a short description.

This keeps Slack noise under control while still providing a tight feedback loop for changes that actually matter.

4. The Next.js dashboard

The dashboard is intentionally simple, but optimized for SRE/DevOps workflows rather than demos.

Key views

The app exposes three main pages:

Dashboard

High‑level stats: number of active drifts, baselines, and monitored environments.
Recent drifts, sorted by time and severity.
Baseline overview (which environments are covered, which baselines are stale).

Drifts

Table of drift events with:
- Severity chips.
- Resource and environment.
- Type of drift (ADDED, REMOVED, MODIFIED).
- Detected time.
Filters for severity, status, and environment.

Baselines

List of baselines with:
- Name, environment.
- Status (Active / Stale / Archived).
- Last updated time.
- Links into the Drifts view filtered by baseline.

Data flow

The dashboard queries Supabase/PostgreSQL via a light API layer:

Fetch lists of drifts and baselines.
Support simple aggregation for dashboard metrics (e.g., count of active drifts).
Poll frequently enough to make the UI feel “live” without hammering the backend.

The focus is on operational clarity. It should be easy to answer:

“What changed recently?”
“Is this environment drifting more than others?”
“Which baselines are out of date?”

Why this architecture?

This design deliberately avoids premature complexity:

Serverless for cadence‑based work – Lambdas plus an EventBridge scheduler are a natural fit for “run every N minutes and compare snapshots”.
S3 + Postgres gives both durability and queryability:
- S3 for raw history.
- Postgres for fast reads and simple aggregations.
Next.js dashboard:
- Easy to deploy.
- Easy to iterate on UX.
- Pairs well with Supabase as a backend.

At the same time, it leaves room to grow:

Add more resource types beyond EC2 and security groups.
Introduce per‑environment baselines and multi‑account support.
Expand the dashboard with timelines, diff views, and richer filters.

Future improvements

Better diff views – show structured diffs (field‑level before/after) in the UI, not just “modified”.
Alert policies – configurable rules to decide which drifts should alert where (Slack, email, etc.).
Multi‑cloud support – abstract snapshot/detect logic to handle other providers.
Drift remediation hooks – for certain classes of drift, trigger runbooks or automated remediation.

The current version focuses on the basics: detect, classify, alert, and visualize. That’s already enough to catch the most painful “someone changed prod” issues and to tell a coherent story in a portfolio or blog post.

Wrapping up

Config Drift Detector started as a way to make configuration changes more visible, but it also became a nice exercise in small, focused architecture:

One clear data flow: AWS → snapshots → drift detection → alerts → dashboard.
Minimal moving parts, each doing one job well.
A UI that reflects how operators actually investigate and respond to drift.

If you’re interested in configuration management, SRE tooling, or just want a portfolio project that goes beyond CRUD, building something like this is a great way to explore the intersection of cloud architecture, observability, and developer experience.