Preventing Silent ECS Deployment Failures with Circuit Breaker

Published: (February 26, 2026 at 05:57 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

Overview

AWS Elastic Container Service (ECS) provides a built‑in feature called the deployment circuit breaker, designed to make service deployments safer and more resilient.

The circuit breaker continuously monitors the health of tasks during a deployment and automatically rolls back changes if newly launched tasks fail to become healthy. When enabled, it prevents failed deployments from leaving services in a degraded or non‑functional state.

Without this safeguard, deployment failures can easily go unnoticed. For example, if new tasks fail to start or never pass health checks, the service may still appear to be running while it is effectively broken. These silent failures can result in data loss, financial impact, or operational issues depending on the workload.

In this post, I’ll walk through how to:

  1. Enable the ECS deployment circuit breaker using Terraform.
  2. Observe deployment failures via EventBridge.
  3. Send real‑time alerts to Slack.

Why the ECS Deployment Circuit Breaker Matters

Enabling the deployment circuit breaker provides several important benefits:

  • Automatic rollback – Failed deployments are reverted to the last known healthy service revision.
  • Improved visibility – ECS emits structured events whenever a deployment fails or rolls back.
  • Reduced operational overhead – Failures are mitigated automatically without immediate manual intervention.

Together, these significantly reduce the risk of production incidents caused by faulty deployments.

Enabling the Circuit Breaker with Terraform

The deployment circuit breaker can be enabled directly in your ECS service definition. In Terraform, this is done using the deployment_circuit_breaker block:

resource "aws_ecs_service" "default" {
  name            = "tuve"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.default.arn
  desired_count   = 2

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  # … other service settings …
}

With this configuration in place, ECS will automatically stop and roll back a deployment if the new tasks fail to reach a healthy state.

Once enabled, the AWS Management Console clearly indicates that the Deployment circuit breaker is turned on.

Deployment circuit breaker enabled in the console

Observing Deployment Failures

Automatic rollback is useful, but visibility is just as important.

When the ECS deployment circuit breaker triggers, ECS emits events to Amazon EventBridge with the following detail type:

ECS Deployment State Change

Example Event Payload

{
  "version": "0",
  "id": "ddca6449-b258-46c0-8653-e0e3aEXAMPLE",
  "detail-type": "ECS Deployment State Change",
  "source": "aws.ecs",
  "account": "111122223333",
  "time": "2020-05-23T12:31:14Z",
  "region": "eu-central-1",
  "resources": [
    "arn:aws:ecs:eu-central-1:111122223333:service/default/servicetest"
  ],
  "detail": {
    "eventType": "ERROR",
    "eventName": "SERVICE_DEPLOYMENT_FAILED",
    "deploymentId": "ecs-svc/123",
    "updatedAt": "2020-05-23T11:11:11Z",
    "reason": "ECS deployment circuit breaker: task failed to start."
  }
}

Key Fields to Monitor

FieldDescription
eventNameSERVICE_DEPLOYMENT_FAILED or SERVICE_DEPLOYMENT_ROLLBACK_COMPLETED
reasonHuman‑readable explanation of why the deployment failed
resourcesARN(s) of the affected ECS service(s)
updatedAtTimestamp of the failure event

Tracking these fields ensures that deployment issues are visible immediately instead of being discovered hours later.

Deployment Rollback in the AWS Console

The AWS Management Console also provides clear visibility into rollback activity. After a failed deployment, the Deployments tab shows the rollback status along with the target service revision.

Rollback view in the console

This view is particularly useful for confirming that the circuit breaker worked as expected.

Sending Deployment Alerts to Slack

To ensure deployment failures are noticed immediately, ECS deployment events can be routed to Slack using EventBridge and a Lambda function.

Overall flow: ECS → EventBridge → Lambda → Slack

Lambda Handler Example (Python)

import json
import urllib.request

SLACK_WEBHOOK_URL = "https://hooks.slack.com/services/XXXXX/XXXXX/XXXXX"

def lambda_handler(event, context):
    # Verify we have the right event type
    if event.get("detail-type") != "ECS Deployment State Change":
        return

    detail = event.get("detail", {})
    event_name = detail.get("eventName")

    # Only act on failure or rollback completion events
    if event_name not in (
        "SERVICE_DEPLOYMENT_FAILED",
        "SERVICE_DEPLOYMENT_ROLLBACK_COMPLETED",
    ):
        return

    # Extract useful information
    resources = event.get("resources", [])
    service_name = resources[0].split("/")[-1] if resources else "unknown"
    reason = detail.get("reason", "Unknown")
    updated_at = detail.get("updatedAt", "Unknown")

    # Build Slack message
    message = {
        "text": f"*ECS Deployment Alert*\n"
                f"*Service:* `{service_name}`\n"
                f"*Event:* `{event_name}`\n"
                f"*Reason:* {reason}\n"
                f"*Time:* {updated_at}"
    }

    # Send to Slack
    data = json.dumps(message).encode("utf-8")
    req = urllib.request.Request(
        SLACK_WEBHOOK_URL,
        data=data,
        headers={"Content-Type": "application/json"},
    )
    urllib.request.urlopen(req)

    return {"statusCode": 200, "body": "Notification sent"}

Configure an EventBridge rule that matches the ECS Deployment State Change detail type and forwards the events to this Lambda function. When a failure or rollback occurs, the Lambda posts a formatted message to the designated Slack channel, giving your team immediate visibility.

TL;DR

  • Enable deployment_circuit_breaker in your ECS service (Terraform example above).
  • Use EventBridge to capture ECS Deployment State Change events.
  • Forward those events to a Lambda that posts to Slack for instant alerts.

With the circuit breaker, EventBridge monitoring, and Slack notifications in place, you’ll have a robust safety net that automatically rolls back bad deployments and makes sure you know about them the moment they happen.

send_slack_notification(
    service=service_name,
    reason=reason,
    event_type=event_name,
    timestamp=updated_at
)

EventBridge Rule (Terraform)

The following EventBridge rule filters ECS deployment events and forwards them to the Lambda function:

resource "aws_cloudwatch_event_rule" "ecs_deployment" {
  name = "ecs-deployment-events"

  event_pattern = jsonencode({
    "source": ["aws.ecs"],
    "detail-type": ["ECS Deployment State Change"],
    "detail": {
      "eventName": [
        "SERVICE_DEPLOYMENT_FAILED",
        "SERVICE_DEPLOYMENT_ROLLBACK_COMPLETED"
      ]
    }
  })
}

resource "aws_cloudwatch_event_target" "lambda" {
  rule = aws_cloudwatch_event_rule.ecs_deployment.name
  arn  = aws_lambda_function.notification.arn
}

Final Outcome

After enabling the ECS deployment circuit breaker and adding Slack notifications:

  • Failed deployments automatically roll back
  • Silent service failures are eliminated
  • Deployment issues become visible in real time
  • ECS services are safer by default

By combining automated rollback with real-time alerts, you can significantly reduce operational risk and increase confidence in your ECS deployments.

0 views
Back to Blog

Related posts

Read more »