Setting Up Continuous Terraform Drift Monitoring With GitHub Actions and Slack

Published: 4 days ago (June 6, 2026 at 01:50 PM EDT)

9 min read

Source: Dev.to

Most teams discover Terraform drift the hard way — someone runs terraform plan before a deploy and gets a screen full of unexpected changes. By then the drift might have been sitting there for weeks. Maybe longer. What if you could catch it automatically? Run a scan every few hours, get a Slack message only when something important drifts, and ignore the noise? That’s what this tutorial sets up. By the end, you’ll have: A GitHub Actions workflow that scans your Terraform infrastructure on a schedule Slack alerts that only fire for High and Critical severity drift A JSON report saved as an artifact for audit trails The whole thing running hands-free, zero maintenance I’m using tfdrift for this — it’s an open-source CLI I built that classifies drift by severity. But the GitHub Actions + Slack pattern works with any drift detection approach. Before we start, you’ll need: A GitHub repo with your Terraform code AWS credentials (or whatever cloud provider you use) A Slack workspace where you can create a webhook Python 3.9+ (tfdrift is a Python CLI) About 20 minutes First, let’s set up where the alerts will go. Go to api.slack.com/apps

Click “Create New App” → “From scratch”

Name it something like “Drift Alerts” and pick your workspace Go to “Incoming Webhooks” in the sidebar → toggle it On

Click “Add New Webhook to Workspace”

Pick the channel you want alerts in (I use #infra-alerts) Copy the webhook URL — it looks like https://hooks.slack.com/services/T00000/B00000/XXXX

Save that URL. We’ll need it in a minute. You need to store your cloud credentials and Slack webhook as GitHub secrets so they’re not exposed in your workflow file. Go to your repo → Settings → Secrets and variables → Actions → New repository secret Add these: AWS_ACCESS_KEY_ID → your AWS access key AWS_SECRET_ACCESS_KEY → your AWS secret key AWS_DEFAULT_REGION → us-east-1 (or whatever region you use) SLACK_WEBHOOK_URL → the webhook URL from Step 1

If you’re using AWS SSO or assuming roles, you’ll need AWS_ROLE_ARN too — but the basic access key approach works for getting started. Create this file in your repo at .github/workflows/drift-check.yml: name: Terraform Drift Check

on: schedule: # Run every 6 hours - cron: ‘0 */6 * * *’ workflow_dispatch: # Allow manual triggering from the Actions tab

jobs: drift-check: runs-on: ubuntu-latest timeout-minutes: 30

steps:
  - name: Checkout code
    uses: actions/checkout@v4

  - name: Set up Terraform
    uses: hashicorp/setup-terraform@v3
    with:
      terraform_version: 1.7.0
      terraform_wrapper: false

  - name: Set up Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install tfdrift
    run: pip install tfdrift

  - name: Run drift scan
    id: drift_scan
    env:
      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
      AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
    run: |
      tfdrift scan \
        --path ./infrastructure \
        --format json \
        --output drift-report.json \
        --slack-webhook ${{ secrets.SLACK_WEBHOOK_URL }} \
        --quiet
    continue-on-error: true

  - name: Upload drift report
    if: always()
    uses: actions/upload-artifact@v4
    with:
      name: drift-report-${{ github.run_number }}
      path: drift-report.json
      retention-days: 30

  - name: Fail on drift
    if: steps.drift_scan.outcome == 'failure'
    run: |
      echo "::warning::Terraform drift detected! Check the drift report artifact."
      exit 1

Let me walk through what each part does. The schedule trigger: cron: ‘0 */6 * * *’ runs the scan every 6 hours — at midnight, 6am, noon, and 6pm UTC. You can adjust this. Every hour is ‘0 * * * *’, once a day at midnight is ‘0 0 * * *’. I find every 6 hours is a good balance between catching drift quickly and not burning through GitHub Actions minutes. workflow_dispatch: This lets you trigger the scan manually from the Actions tab. Useful for testing or when you want an immediate check. terraform_wrapper: false: This is important. The hashicorp/setup-terraform action wraps the terraform binary by default, which breaks JSON output parsing. Setting this to false gives you the raw terraform binary. continue-on-error: true on the scan step: tfdrift exits with code 1 when drift is detected. Without continue-on-error, the workflow would stop and skip the artifact upload. We want to always save the report, then handle the failure separately in the last step. The —quiet flag: Suppresses terminal output since we’re running in CI. The JSON report captures everything we need. Artifact upload with if: always(): Saves the JSON report regardless of whether drift was found. This gives you an audit trail — you can go back and see what your infrastructure looked like at any point in the last 30 days. Right now the workflow scans everything and alerts on all drift. To filter out the noise, create a .tfdrift.yml in your repo root:

.tfdrift.yml

scan: paths: - ./infrastructure exclude: - “/.terraform/” - “/test/” - “/modules/”

notifications: slack: webhook_url: ${SLACK_WEBHOOK_URL} channel: “#infra-alerts” min_severity: high # Only alert on High and Critical

severity: critical: - aws_security_group..ingress - aws_security_group..egress - aws_iam_policy..policy - aws_iam_role..assume_role_policy - aws_s3_bucket_public_access_block.* high: - aws_instance..instance_type - aws_rds_instance..publicly_accessible - aws_rds_instance.*.storage_encrypted

The key setting is min_severity: high. This means: Critical drift (security groups, IAM policies) → Slack alert immediately High drift (instance types, encryption) → Slack alert immediately Medium drift → logged in the JSON report but no alert Low drift (tags) → logged in the JSON report but no alert This is what took our alert volume from 100% down to 27% while still catching 94% of the changes that actually mattered. Create .tfdriftignore in your repo root for drift you never want to see:

Autoscaling — these change every few minutes by design

aws_autoscaling_group..desired_capacity aws_ecs_service..desired_count

Tags managed by external cost allocation tools

*.tags.CostCenter *.tags.LastModified *.tags.UpdatedBy *.tags.ManagedBy

Terraform internal metadata

*.tags.terraform .tags_all.

Without this file, you’ll get constant alerts about autoscaling changes. Those aren’t drift — that’s the system working correctly. Push everything and trigger a manual run: Commit and push .github/workflows/drift-check.yml, .tfdrift.yml, and .tfdriftignore

Go to your repo → Actions tab → Terraform Drift Check → Run workflow

Watch it run If there’s drift, you’ll see: A Slack message in #infra-alerts with the severity breakdown A JSON artifact attached to the workflow run The workflow marked as failed (exit code 1) If there’s no drift: No Slack message (nothing to alert about) A JSON artifact with an empty drift report The workflow marked as passed (exit code 0) When tfdrift finds High or Critical drift, it sends a message like: ⚠️ Terraform Drift Detected — 3 resource(s)

Workspaces scanned: 12 With drift: 2 Severity: critical: 1 | high: 2

🔴 CRITICAL — aws_security_group.api_sg Action: update | Changed: ingress

🟠 HIGH — aws_instance.web_server Action: update | Changed: instance_type, ami

🟠 HIGH — aws_rds_instance.primary Action: update | Changed: publicly_accessible

You immediately know what drifted, how bad it is, and which workspace it’s in. No log diving, no running terraform plan manually, no guessing. A few things I’d add for a real production setup: If you have separate Terraform directories for dev, staging, and production, you can either scan them all in one workflow or create separate workflows with different schedules:

Production — check every 2 hours

cron: ‘0 */2 * * *‘

Staging — check every 6 hours

cron: ‘0 */6 * * *‘

Dev — check once a day

cron: ‘0 8 * * *’

You can also use separate .tfdrift.yml files per environment with different severity thresholds. Maybe you only alert on Critical for dev but alert on Medium+ for production. You can add drift checking to your PR workflow so that drift must be resolved before merging: on: pull_request: branches: [main] paths: - ‘infrastructure/**’

This catches drift before it becomes a deployment surprise. If someone opened a security group in the console and you’re about to deploy, the PR pipeline will flag it. For production infrastructure, you might want Critical drift to page someone instead of just going to Slack: notifications: slack: webhook_url: ${SLACK_WEBHOOK_URL} min_severity: high pagerduty: routing_key: ${PAGERDUTY_ROUTING_KEY} min_severity: critical

This means High drift goes to Slack, but Critical drift (someone modified a security group or IAM policy) pages the on-call engineer. That’s the kind of change you want someone looking at within minutes, not hours. For weekly reviews or compliance, you can generate an HTML report:

name: Generate HTML report run: | tfdrift report
—path ./infrastructure
—output drift-report.html

The HTML report gives you a standalone page with severity breakdowns, drifted resources, and workspace details — useful for sharing with security teams or attaching to compliance documentation. A few practical things to know: GitHub Actions minutes: The free tier gives you 2,000 minutes/month. A drift scan typically takes 2-5 minutes depending on how many workspaces you have. Running every 6 hours is ~120 runs/month × ~3 minutes = ~360 minutes. Well within the free tier. Terraform API calls: Each terraform plan makes API calls to your cloud provider to refresh state. AWS doesn’t charge for most describe/get API calls, but if you have hundreds of workspaces, the volume might matter. Monitor your AWS CloudTrail to be safe. Slack rate limits: Slack webhooks are limited to 1 message per second. If you’re scanning 50 workspaces and 10 have drift, tfdrift batches them into a single Slack message, so this shouldn’t be an issue. After setup, your repo should look like: your-terraform-repo/ ├── .github/ │ └── workflows/ │ └── drift-check.yml # The scan workflow ├── .tfdrift.yml # Severity rules and notification config ├── .tfdriftignore # Expected drift exclusions └── infrastructure/ ├── production/ │ ├── main.tf │ └── terraform.tfvars ├── staging/ │ └── main.tf └── dev/ └── main.tf

If you don’t have tfdrift installed yet: pip install tfdrift tfdrift scan —path ./your-terraform-dir

The GitHub Actions workflow, .tfdrift.yml, and .tfdriftignore templates are all in the tfdrift repo under the examples/ directory. If you set this up and run into issues, open a GitHub issue — I’d genuinely like to hear about edge cases I haven’t hit yet. This is part 4 of a series on infrastructure drift detection. Part 1: I Built a Free Terraform Drift Detector — Here’s Why. Part 2: Why Severity Classification Changes Everything. Part 3: How I Built a Terraform Plan JSON Parser in Python.

Setting Up Continuous Terraform Drift Monitoring With GitHub Actions and Slack

.tfdrift.yml

Autoscaling — these change every few minutes by design

Tags managed by external cost allocation tools

Terraform internal metadata

Production — check every 2 hours

Staging — check every 6 hours

Dev — check once a day

Related posts

How Agile Octopus Pricing Actually Works (And Is It Worth the Hassle?)

Mobile Midsommer Madness

The Author Doesn't Have to Be an Engineer: How the Harness Holds Quality (Series Part 5)

I built a hardware-inspired UI component library in pure Vanilla JS — here's how