From Detection to Resolution: A Closed-Loop System for Managing AWS CloudFormation Drift
Source: Dev.to
The Solution: An Interactive Drift Management Tool
Instead of adding another notification system that contributes to alert fatigue, this solution creates an interactive workflow. It delivers actionable alerts that empower engineers to make decisions directly from Slack. By allowing teams to formally “Acknowledge” or “Ignore” a detected drift, the system:
- Brings order to the chaos
- Creates a clear audit trail
- Lets teams focus on what matters most
Architectural Blueprint: A Closed‑Loop System
This solution moves beyond simple notifications and creates a full, closed‑loop system for managing configuration drift at scale. It’s built on a foundation of event‑driven, serverless components that provide not just information, but control.
The Trigger (AWS Config)
The process begins with the AWS Config service. Using the built‑in rule cloudformation-stack-drift-detection-check, Config continuously monitors your CloudFormation stacks. When a stack’s actual configuration deviates from its template, Config flags it as NON_COMPLIANT.
The Router (Amazon EventBridge)
The NON_COMPLIANT status is published as an event. An Amazon EventBridge rule listens for these events from AWS Config and forwards the event payload to the first AWS Lambda function for processing.
The Notifier (AWS Lambda)
The first Lambda function acts as the initial alert mechanism. Triggered by the EventBridge event, it performs two key actions:
- Inspects the drifted stack to confirm it contains the
MONITOR_DRIFTtag with a value oftrue. - If the tag is present, constructs a rich notification—complete with “Acknowledge” and “Ignore” buttons—and sends it to a designated Slack channel, providing immediate visibility and a direct call to action.
The State Manager (AWS Lambda, API Gateway & DynamoDB)
A second, distinct workflow handles interactive state management:
- An AWS Lambda function persists the details of drifted stacks into an Amazon DynamoDB table, creating a centralized source of truth.
- When an engineer clicks “Acknowledge” or “Ignore” in the Slack message, the action is sent to an Amazon API Gateway endpoint.
- The API Gateway call invokes the state‑manager Lambda, which updates the corresponding stack’s status in DynamoDB. This lets the team manage priorities, reduce alert noise by ignoring known drifts, and maintain a clear audit trail.
Putting It Into Practice
Enrolling a stack into this management system remains incredibly simple. To enable drift detection and interactive alerts for any CloudFormation stack, you only need to perform one action:
Add the tag MONITOR_DRIFT with a value of true to the stack.
Once tagged, the stack is automatically picked up by the system. Any future drift will trigger the interactive notification in Slack, allowing your team to begin managing it immediately.
Behind the Code: An Interactive Slack Message
The key to this workflow is the interactive Slack message. Below is a simplified view of the JSON payload used to create a message with action buttons.
// A simplified look at an interactive Slack message payload
const slackMessage = {
channel: 'your-drift-alerts-channel',
text: `*Drift Detected in Stack: YourStackName*`,
attachments: [
{
text: 'A drift from the expected template has been detected. Please review and choose an action.',
fallback: 'You are unable to choose an action.',
callback_id: 'drift_action_callback',
color: '#F35B5B',
attachment_type: 'default',
fields: [
{ title: 'Account', value: '123456789012', short: true },
{ title: 'Region', value: 'us-east-1', short: true }
],
actions: [
{
name: 'acknowledge',
text: 'Acknowledge',
type: 'button',
value: 'acknowledged',
style: 'primary'
},
{
name: 'ignore',
text: 'Ignore',
type: 'button',
value: 'ignored'
}
]
}
]
};
This snippet illustrates how action buttons are added to a Slack message, enabling the interactive workflow.
Conclusion
Effective infrastructure management at scale requires moving beyond passive detection to active resolution. By creating a closed‑loop, interactive system, you empower your engineers to manage CloudFormation drift efficiently, directly from the tools they use every day. This architecture provides:
- A robust audit trail
- Reduced alert fatigue
- A more organized, prioritized approach to maintaining infrastructure integrity
It transforms a persistent operational challenge into a streamlined, manageable process.