Preventing Silent ECS Deployment Failures with Circuit Breaker
Source: Dev.to
Overview
AWS Elastic Container Service (ECS) provides a built‑in feature called the deployment circuit breaker, designed to make service deployments safer and more resilient.
The circuit breaker continuously monitors the health of tasks during a deployment and automatically rolls back changes if newly launched tasks fail to become healthy. When enabled, it prevents failed deployments from leaving services in a degraded or non‑functional state.
Without this safeguard, deployment failures can easily go unnoticed. For example, if new tasks fail to start or never pass health checks, the service may still appear to be running while it is effectively broken. These silent failures can result in data loss, financial impact, or operational issues depending on the workload.
In this post, I’ll walk through how to:
- Enable the ECS deployment circuit breaker using Terraform.
- Observe deployment failures via EventBridge.
- Send real‑time alerts to Slack.
Why the ECS Deployment Circuit Breaker Matters
Enabling the deployment circuit breaker provides several important benefits:
- Automatic rollback – Failed deployments are reverted to the last known healthy service revision.
- Improved visibility – ECS emits structured events whenever a deployment fails or rolls back.
- Reduced operational overhead – Failures are mitigated automatically without immediate manual intervention.
Together, these significantly reduce the risk of production incidents caused by faulty deployments.
Enabling the Circuit Breaker with Terraform
The deployment circuit breaker can be enabled directly in your ECS service definition. In Terraform, this is done using the deployment_circuit_breaker block:
resource "aws_ecs_service" "default" {
name = "tuve"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.default.arn
desired_count = 2
deployment_circuit_breaker {
enable = true
rollback = true
}
# … other service settings …
}
With this configuration in place, ECS will automatically stop and roll back a deployment if the new tasks fail to reach a healthy state.
Once enabled, the AWS Management Console clearly indicates that the Deployment circuit breaker is turned on.

Observing Deployment Failures
Automatic rollback is useful, but visibility is just as important.
When the ECS deployment circuit breaker triggers, ECS emits events to Amazon EventBridge with the following detail type:
ECS Deployment State Change
Example Event Payload
{
"version": "0",
"id": "ddca6449-b258-46c0-8653-e0e3aEXAMPLE",
"detail-type": "ECS Deployment State Change",
"source": "aws.ecs",
"account": "111122223333",
"time": "2020-05-23T12:31:14Z",
"region": "eu-central-1",
"resources": [
"arn:aws:ecs:eu-central-1:111122223333:service/default/servicetest"
],
"detail": {
"eventType": "ERROR",
"eventName": "SERVICE_DEPLOYMENT_FAILED",
"deploymentId": "ecs-svc/123",
"updatedAt": "2020-05-23T11:11:11Z",
"reason": "ECS deployment circuit breaker: task failed to start."
}
}
Key Fields to Monitor
| Field | Description |
|---|---|
| eventName | SERVICE_DEPLOYMENT_FAILED or SERVICE_DEPLOYMENT_ROLLBACK_COMPLETED |
| reason | Human‑readable explanation of why the deployment failed |
| resources | ARN(s) of the affected ECS service(s) |
| updatedAt | Timestamp of the failure event |
Tracking these fields ensures that deployment issues are visible immediately instead of being discovered hours later.
Deployment Rollback in the AWS Console
The AWS Management Console also provides clear visibility into rollback activity. After a failed deployment, the Deployments tab shows the rollback status along with the target service revision.

This view is particularly useful for confirming that the circuit breaker worked as expected.
Sending Deployment Alerts to Slack
To ensure deployment failures are noticed immediately, ECS deployment events can be routed to Slack using EventBridge and a Lambda function.
Overall flow: ECS → EventBridge → Lambda → Slack
Lambda Handler Example (Python)
import json
import urllib.request
SLACK_WEBHOOK_URL = "https://hooks.slack.com/services/XXXXX/XXXXX/XXXXX"
def lambda_handler(event, context):
# Verify we have the right event type
if event.get("detail-type") != "ECS Deployment State Change":
return
detail = event.get("detail", {})
event_name = detail.get("eventName")
# Only act on failure or rollback completion events
if event_name not in (
"SERVICE_DEPLOYMENT_FAILED",
"SERVICE_DEPLOYMENT_ROLLBACK_COMPLETED",
):
return
# Extract useful information
resources = event.get("resources", [])
service_name = resources[0].split("/")[-1] if resources else "unknown"
reason = detail.get("reason", "Unknown")
updated_at = detail.get("updatedAt", "Unknown")
# Build Slack message
message = {
"text": f"*ECS Deployment Alert*\n"
f"*Service:* `{service_name}`\n"
f"*Event:* `{event_name}`\n"
f"*Reason:* {reason}\n"
f"*Time:* {updated_at}"
}
# Send to Slack
data = json.dumps(message).encode("utf-8")
req = urllib.request.Request(
SLACK_WEBHOOK_URL,
data=data,
headers={"Content-Type": "application/json"},
)
urllib.request.urlopen(req)
return {"statusCode": 200, "body": "Notification sent"}
Configure an EventBridge rule that matches the ECS Deployment State Change detail type and forwards the events to this Lambda function. When a failure or rollback occurs, the Lambda posts a formatted message to the designated Slack channel, giving your team immediate visibility.
TL;DR
- Enable
deployment_circuit_breakerin your ECS service (Terraform example above). - Use EventBridge to capture
ECS Deployment State Changeevents. - Forward those events to a Lambda that posts to Slack for instant alerts.
With the circuit breaker, EventBridge monitoring, and Slack notifications in place, you’ll have a robust safety net that automatically rolls back bad deployments and makes sure you know about them the moment they happen.
send_slack_notification(
service=service_name,
reason=reason,
event_type=event_name,
timestamp=updated_at
)
EventBridge Rule (Terraform)
The following EventBridge rule filters ECS deployment events and forwards them to the Lambda function:
resource "aws_cloudwatch_event_rule" "ecs_deployment" {
name = "ecs-deployment-events"
event_pattern = jsonencode({
"source": ["aws.ecs"],
"detail-type": ["ECS Deployment State Change"],
"detail": {
"eventName": [
"SERVICE_DEPLOYMENT_FAILED",
"SERVICE_DEPLOYMENT_ROLLBACK_COMPLETED"
]
}
})
}
resource "aws_cloudwatch_event_target" "lambda" {
rule = aws_cloudwatch_event_rule.ecs_deployment.name
arn = aws_lambda_function.notification.arn
}
Final Outcome
After enabling the ECS deployment circuit breaker and adding Slack notifications:
- Failed deployments automatically roll back
- Silent service failures are eliminated
- Deployment issues become visible in real time
- ECS services are safer by default
By combining automated rollback with real-time alerts, you can significantly reduce operational risk and increase confidence in your ECS deployments.