AWS ECS Service Task Recycle

Published: (January 2, 2026 at 03:13 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Overview

This solution provides controlled task recycling for ECS services by:

  • Stopping tasks one at a time instead of parallel replacement
  • Waiting for service stability between each task replacement
  • Optionally maintaining service state by temporarily increasing capacity
  • Configurable wait time between task replacements

Features

  • Sequential Task Recycling – Stops and replaces tasks one by one
  • Service Stability – Waits for a stable state after each task replacement
  • Capacity Management – Optional temporary capacity increase to maintain availability
  • Autoscaling Support – Handles services with Application Auto Scaling
  • Flexible Authentication – Multiple AWS credential methods via AWSSession module
  • Email Notifications – Optional SMTP notifications on completion
  • CloudFormation Deployment – Infrastructure as code with automated deployment
  • Zero RetriesEventInvokeConfig set to 0 retry attempts
  • Comprehensive Logging – Detailed CloudWatch logs for monitoring

Architecture

Lambda Function (Python 3.13)
├── Event-driven execution
├── AWSSession.py (AWS authentication)
├── Notification.py (Email notifications)
└── input.json (Configuration)

Prerequisites

  • Python 3.13+
  • AWS CLI configured
  • IAM permissions for ECS and Application Auto Scaling
  • SMTP server (optional, for notifications)

Installation

1. Clone Repository

cd aws-ecs-service-task-recycle

2. Configure Settings

Edit input.json with your configuration:

{
  "awsCredentials": {
    "region_name": "us-east-1"
  },
  "smtpCredentials": {
    "host": "smtp.example.com",
    "port": "587",
    "username": "user@example.com",
    "password": "password",
    "from_email": "noreply@example.com"
  },
  "emailNotification": {
    "email_subject": "ECS Service Task Recycle Completed",
    "subject_prefix": "AWS ECS",
    "to": ["admin@example.com"]
  }
}

3. Deploy CloudFormation Stack

chmod +x cloudformation_deploy.sh lambda_build.sh
./cloudformation_deploy.sh

Usage

Lambda Event Parameters

{
  "cluster_name": "my-ecs-cluster",
  "service_name": "my-service",
  "maintain_service_state": true,
  "wait_time": 30
}

Parameters

ParameterRequiredDefaultDescription
cluster_nameYesECS cluster name
service_nameYesECS service name
maintain_service_stateNotrueTemporarily increase capacity by 1
wait_timeNo30Seconds to wait between task replacements

Invoke Lambda Function

AWS CLI

aws lambda invoke \
  --function-name ecs-task-recycle-function \
  --payload '{"cluster_name":"my-cluster","service_name":"my-service","maintain_service_state":true,"wait_time":30}' \
  response.json

AWS Console

  1. Navigate to Lambda → Functions → ecs-task-recycle-function
  2. Open the Test tab → Create test event
  3. Add the event JSON and click Test

How It Works

Process Flow

  1. Get Current State – Retrieve service configuration and running tasks
  2. Increase Capacity (if maintain_service_state=true) – Add +1 to desired count
  3. Wait for Stability – Ensure the new task is running
  4. Recycle Tasks – For each old task:
    • Stop the task
    • Wait for replacement task to start
    • Wait for service stability
    • Sleep for the configured wait_time
  5. Restore Capacity – Return to the original desired count
  6. Send Notification – Email report (if configured)

Example Scenario

Service with 3 tasks

Initial State: 3 tasks running

Increase to 4 tasks (maintain availability)

Stop task 1 → Wait stable → Sleep 30s

Stop task 2 → Wait stable → Sleep 30s

Stop task 3 → Wait stable → Sleep 30s

Restore to 3 tasks

Complete

Configuration

AWS Credentials (input.json)

Multiple authentication methods are supported:

{
  "awsCredentials": {
    "region_name": "us-east-1",
    "profile_name": "my-profile",
    "role_arn": "arn:aws:iam::123456789012:role/MyRole",
    "access_key": "AKIAIOSFODNN7EXAMPLE",
    "secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    "session_token": "token"
  }
}

SMTP Configuration (Optional)

{
  "smtpCredentials": {
    "host": "smtp.gmail.com",
    "port": "587",
    "username": "user@gmail.com",
    "password": "app-password",
    "from_email": "noreply@example.com"
  }
}

IAM Permissions

Required permissions (included in the CloudFormation template):

{
  "Effect": "Allow",
  "Action": [
    "ecs:DescribeServices",
    "ecs:UpdateService",
    "ecs:ListTasks",
    "ecs:StopTask",
    "ecs:DescribeTasks",
    "application-autoscaling:DescribeScalableTargets",
    "application-autoscaling:RegisterScalableTarget"
  ],
  "Resource": "*"
}

CloudFormation Resources

  • Lambda Function: Python 3.13 runtime, 900 s timeout, 256 MB memory
  • IAM Role: Execution role with ECS and Auto Scaling permissions
  • EventInvokeConfig: MaximumRetryAttempts set to 0
  • CloudWatch Logs: 7‑day retention

Monitoring

CloudWatch Logs

aws logs tail /aws/lambda/ecs-task-recycle-function --follow

Key Log Messages

  • Starting task recycle for {cluster}/{service}
  • Original desired count: X, tasks: Y
  • Recycling task N/M: {task_arn}
  • Task N recycled, waiting Xs
  • Task recycle completed successfully

Troubleshooting

Service Not Stabilizing

  • Increase waiter MaxAttempts in code (default: 40)
  • Check ECS service health and task definitions
  • Verify target‑group health checks

Timeout Errors

  • Increase Lambda timeout (default: 900 s)
  • Reduce number of tasks or increase wait_time

Authentication Failures

  • Verify IAM role permissions
  • Check AWS credentials in input.json
  • Ensure Lambda execution role is correct

Best Practices

  • Test in Non‑Production – Always test with non‑critical services first.
  • Monitor CloudWatch – Watch logs during the first execution.
  • Adjust Wait Time – Tune based on application startup time.
  • Use Maintain State – Enable for production services.
  • Schedule Wisely – Run during low‑traffic periods.

Comparison with Force Deployment

FeatureForce DeploymentTask Recycle
Task ReplacementParallelSequential
Service DisruptionHigherLower
Completion TimeFasterSlower
ControlLimitedConfigurable
Wait Between TasksNoYes

Security Considerations

  • Lambda execution role follows the principle of least privilege.
  • No hard‑coded credentials in code.
  • SMTP credentials stored in input.json (use Secrets Manager in production).
  • CloudWatch logs provide an audit trail.
  • EventInvokeConfig prevents retry storms.

Cost Optimization

  • Lambda execution time ≈ (number of tasks × wait_time) seconds.
  • CloudWatch Logs: 7‑day retention (no additional storage cost).
  • No extra AWS service charges.
  • Consider scheduling during off‑peak hours to reduce indirect costs.

Limitations

  • Maximum Lambda execution time: 15 minutes.
  • Suitable for services with < 20 tasks (with a 30 s wait time).
  • Requires a stable service for the waiter to succeed.
  • No automatic rollback mechanism on failure.

Contributing

Contributions are welcome! Please follow the repository structure:

  1. Test changes thoroughly.
  2. Update documentation.
  3. Follow the existing code style.
  4. Add appropriate error handling.
Back to Blog

Related posts

Read more »