AWS Backup Failed Monitoring

Published: (December 28, 2025 at 09:41 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Prashant Gupta

Overview

Automated monitoring solution for AWS Backup jobs that identifies failed backup operations and sends detailed reports via email. This tool helps maintain backup compliance by proactively alerting on backup failures.

Key Features

  • Failed Backup Detection – Automatically identifies backup jobs that have failed within the last 7 days.
  • Detailed Reporting – Generates Excel reports with backup plan names, resource details, and failure information.
  • Email Notifications – Sends automated email alerts with attached Excel reports.
  • Multi‑Account Support – Works with AWS profiles, assumed roles, and access keys.
  • Jenkins Integration – Includes a Jenkinsfile for automated scheduling.
  • Secure Configuration – Uses AWS Secrets Manager for email credentials.

Prerequisites

  • Python 3.x
  • AWS CLI configured or appropriate AWS credentials
  • AWS Backup service with backup plans configured
  • AWS Secrets Manager secret for email configuration
  • Required IAM permissions (see iam_policy.json)

Installation

# Clone the repository and navigate to the solution directory
git clone 
cd devops-automation/aws-backup-failed-monitoring
# Install required dependencies
pip install -r requirements.txt

Configure your AWS credentials and email settings in inputs.yml.

Configuration

AWS Authentication

Configure one of the following authentication methods in inputs.yml:

# Option 1: AWS Profile
profile_name: "your-profile-name"

# Option 2: Assumed Role
role_arn: "arn:aws:iam::123456789012:role/BackupMonitoringRole"

# Option 3: Access Keys (not recommended for production)
access_key: "your-access-key"
secret_key: "your-secret-key"
session_token: "your-session-token"  # Optional

Email Configuration

Set up email notifications:

Email:
  enabled: true
  secret_manager: "your-smtp-secret-name"
  details:
    subject_prefix: "AWS Backup Alert"
    to:
      - "admin@example.com"
    cc:
      - "devops@example.com"

AWS Secrets Manager

Create a secret in AWS Secrets Manager with the following structure:

{
  "SMTP_HOST": "smtp.example.com",
  "SMTP_PORT": "587",
  "SMTP_USERNAME": "your-username",
  "SMTP_PASSWORD": "your-password",
  "EMAIL_FROM": "alerts@example.com"
}

Usage

Manual Execution

python main.py

Jenkins Pipeline

The included Jenkinsfile provides automated scheduling:

  • Runs weekly on Mondays at 5:00 AM
  • Includes failure notifications via SNS
  • Configurable environment variables

Shell Script Execution

bash script.sh

IAM Permissions

The solution requires the following AWS permissions (see iam_policy.json):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "backup:ListBackupJobs",
        "backup:GetBackupPlan",
        "backup:ListTags",
        "backup:GetBackupPlanFromJSON"
      ],
      "Resource": "*"
    }
  ]
}

Additional permissions needed:

  • secretsmanager:GetSecretValue – for email configuration
  • sts:AssumeRole – if using role assumption

Output

The solution generates:

  • Excel Reportbackup_jobs.xlsx containing:

    • Backup Plan Name
    • Resource Name and Type
    • Resource ARN
    • Job ID
    • Start Time
    • Job State
  • Email Notification – HTML email with the attached Excel report

  • Console Logs – Detailed logging of the monitoring process

File Structure

aws-backup-failed-monitoring/
├── main.py          # Main monitoring script
├── AWSSession.py    # AWS session management
├── Notification.py # Email notification handler
├── inputs.yml       # Configuration file
├── requirements.txt # Python dependencies
├── iam_policy.json  # Required IAM permissions
├── Jenkinsfile      # Jenkins pipeline configuration
├── script.sh        # Shell execution script
├── .gitignore       # Git ignore rules
└── README.md        # This documentation

Monitoring Logic

  • Time Range – Monitors backup jobs from the last 7 days.
  • Job States – Identifies jobs with FAILED status.
  • Validation – Verifies backup plan existence before reporting.
  • Reporting – Generates detailed Excel reports for failed jobs.
  • Notification – Sends email alerts when failures are detected.

Troubleshooting

Common Issues

  • Missing Dependencies – Ensure all packages in requirements.txt are installed.
  • AWS Permissions – Verify IAM permissions match iam_policy.json.
  • Email Configuration – Check AWS Secrets Manager secret format.
  • Authentication – Ensure AWS credentials are properly configured.

Logging

The script provides detailed logs for each step of the monitoring process, which can be reviewed in the console output or redirected to a log file for further analysis.

Solution Logging

  • INFO level – General operation status
  • ERROR level – Specific error details and exceptions

Security Considerations

  • Credentials – Never commit AWS credentials to version control.
  • Secrets Manager – Use AWS Secrets Manager for email credentials.
  • IAM Roles – Prefer IAM roles over access keys for authentication.
  • Least Privilege – Apply minimal required permissions.

Jenkins Integration

The included Jenkins pipeline:

  • Schedules weekly execution.
  • Provides build‑history management.
  • Includes failure notifications.
  • Supports environment‑specific configuration.

Contributing

When modifying this solution:

  • Test in non‑production environments first.
  • Update documentation for any configuration changes.
  • Follow existing code structure and naming conventions.
  • Ensure security best practices are maintained.

Support

For issues or questions:

  • Check the troubleshooting section.
  • Review AWS CloudTrail logs for API‑call details.
  • Verify IAM permissions and AWS service limits.
  • Contact the DevOps team for assistance.

GitHub Repository

https://github.com/prashantgupta123/devops-automation/tree/main/aws-backup-failed-monitoring

Note: This tool monitors AWS Backup jobs and requires appropriate AWS permissions. Always test in non‑production environments and ensure compliance with your organization’s security policies and procedures.

Back to Blog

Related posts

Read more »