Building an AI-Powered Customer Churn Prediction Pipeline on AWS (Step-by-Step)

Published: 1 month ago (December 31, 2025 at 07:56 PM EST)

5 min read

Source: Dev.to

What we achieved

✅ 84.2% AUC on validation data
✅ Real‑time predictions via SageMaker endpoint
✅ Natural language explanations powered by Claude (Bedrock)

🎯 What We’re Building

An end‑to‑end ML pipeline that:

Step	Action
Ingests	Customer data into S3
Trains	A churn prediction model with SageMaker XGBoost
Deploys	A real‑time inference endpoint
Explains	Predictions using Amazon Bedrock (Claude)
Exposes	Everything via API Gateway + Lambda

Prerequisites: AWS account, basic Python knowledge

🏗️ Architecture Overview

The pipeline consists of 5 tiers

Tier	Services	Purpose
Data Ingestion	S3	Store raw customer data
ML Training	SageMaker Training	Train XGBoost model
Model Storage	S3	Store model artifacts
Inference & AI	SageMaker Endpoint, Bedrock	Real‑time predictions + NL explanations
API Layer	API Gateway, Lambda	Expose REST API

Step 1 – Set Up S3 and Upload Data

# Set bucket name with your account ID
export BUCKET_NAME=churn-prediction-$(aws sts get-caller-identity --query Account --output text)

# Create bucket
aws s3 mb s3://$BUCKET_NAME

# Upload your data
aws s3 cp WA_Fn-UseC_-Telco-Customer-Churn.csv s3://$BUCKET_NAME/raw/

📥 Dataset: Download the Telco Customer Churn dataset from Kaggle.

Step 2 – Create SageMaker IAM Role

Open the IAM console → Roles → Create role.
Choose SageMaker – Execution as the trusted entity.
Attach the policies: AmazonSageMakerFullAccess and AmazonS3FullAccess.
Name the role SageMakerChurnRole.

Step 3 – Train the Model

Create a file named train_churn.py with the following content:

import boto3
import sagemaker
import pandas as pd
import os
from sklearn.model_selection import train_test_split
from sagemaker.inputs import TrainingInput

# Config
BUCKET = os.environ['BUCKET_NAME']
ROLE   = os.environ['ROLE_ARN']
PREFIX = 'churn-prediction'

session = sagemaker.Session()
region  = session.boto_region_name

# Load and prepare data
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce').fillna(0)
df['Churn'] = (df['Churn'] == 'Yes').astype(int)

# Encode categorical columns
cat_cols = [
    'gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines',
    'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
    'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
    'PaperlessBilling', 'PaymentMethod'
]

for col in cat_cols:
    df[col] = df[col].astype('category').cat.codes

# Feature set
feature_cols = ['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges'] + cat_cols
X = df[feature_cols]
y = df['Churn']

# Train‑test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

train_df = pd.concat([y_train.reset_index(drop=True), X_train.reset_index(drop=True)], axis=1)
test_df  = pd.concat([y_test.reset_index(drop=True),  X_test.reset_index(drop=True)], axis=1)

train_df.to_csv('train.csv', index=False, header=False)
test_df.to_csv('test.csv',   index=False, header=False)

# Upload CSVs to S3
s3 = boto3.client('s3')
s3.upload_file('train.csv', BUCKET, f'{PREFIX}/train/train.csv')
s3.upload_file('test.csv',  BUCKET, f'{PREFIX}/test/test.csv')

# XGBoost container
container = sagemaker.image_uris.retrieve('xgboost', region, '1.7-1')

xgb = sagemaker.estimator.Estimator(
    image_uri=container,
    role=ROLE,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path=f's3://{BUCKET}/{PREFIX}/output',
    sagemaker_session=session
)

xgb.set_hyperparameters(
    objective='binary:logistic',
    num_round=100,
    max_depth=5,
    eta=0.2,
    eval_metric='auc'
)

xgb.fit({
    'train': TrainingInput(f's3://{BUCKET}/{PREFIX}/train', content_type='csv'),
    'validation': TrainingInput(f's3://{BUCKET}/{PREFIX}/test', content_type='csv')
})

# Deploy endpoint
predictor = xgb.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium',
    endpoint_name='churn-prediction-endpoint',
    serializer=sagemaker.serializers.CSVSerializer()
)

Run the script

export BUCKET_NAME=churn-prediction-YOUR_ACCOUNT_ID
export ROLE_ARN=arn:aws:iam::YOUR_ACCOUNT_ID:role/SageMakerChurnRole
python3 train_churn.py

Sample training output

2026-01-01 00:24:27 Uploading - Uploading generated training model
2026-01-01 00:24:27 Completed - Training job completed
Training seconds: 103
Billable seconds: 103

✅ Training complete!
Model artifact: s3://churn-prediction-905418352184/churn-prediction/output/sagemaker-xgboost-2026-01-01-00-22-03-339/output/model.tar.gz

Deploying endpoint (3‑5 min)...
INFO:sagemaker:Creating model with name: sagemaker-xgboost-2026-01-01-00-24-53-959
INFO:sagemaker:Creating endpoint-config with name churn-prediction-endpoint
INFO:sagemaker:Creating endpoint with name churn-prediction-endpoint
✅ Endpoint deployed: churn-prediction-endpoint
Test prediction: 0.4% churn probability

You now have a fully functional churn‑prediction model, a real‑time SageMaker endpoint, and (in the next steps) an API layer that calls Bedrock to generate plain‑English explanations for each prediction. Stay tuned for the remaining parts of the tutorial!

Step 4 – Create Lambda with Bedrock Integration

Create a Lambda function ChurnPredictionAPI with the following code:

import json
import boto3
import os

sagemaker_runtime = boto3.client('sagemaker-runtime')
bedrock = boto3.client('bedrock-runtime')

ENDPOINT_NAME = os.environ.get('SAGEMAKER_ENDPOINT', 'churn-prediction-endpoint')

def lambda_handler(event, context):
    body = json.loads(event['body']) if isinstance(event.get('body'), str) else event

    # Get prediction from SageMaker
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=ENDPOINT_NAME,
        ContentType='text/csv',
        Body=body['features']
    )
    churn_prob = float(response['Body'].read().decode())

    # Generate explanation with Bedrock Claude
    prompt = f"""A customer has {churn_prob:.1%} churn probability.
Customer: Tenure {body.get('tenure', 'N/A')} months, ${body.get('monthly_charges', 'N/A')}/month, {body.get('contract', 'N/A')} contract.
In 2 sentences, explain the risk and suggest one retention action."""

    bedrock_response = bedrock.invoke_model(
        modelId='anthropic.claude-3-haiku-20240307-v1:0',
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 100,
            "messages": [{"role": "user", "content": prompt}]
        })
    )
    explanation = json.loads(bedrock_response['body'].read())['content'][0]['text']
    risk = "High" if churn_prob > 0.7 else "Medium" if churn_prob > 0.4 else "Low"

    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps({
            'churn_probability': f"{churn_prob:.1%}",
            'risk_level': risk,
            'explanation': explanation
        })
    }

Test Successful

Lambda configuration

Runtime: Python 3.11
Timeout: 30 seconds
Role: LambdaChurnRole (with SageMaker + Bedrock permissions)
Environment variable: SAGEMAKER_ENDPOINT=churn-prediction-endpoint

Step 5 – Create API Gateway

Create an HTTP API in API Gateway.
Add a Lambda integration → ChurnPredictionAPI.
Create a POST route: /predict.
Deploy and note the invoke URL.

Create API Gateway
Configure Integration

🧪 Test the API

curl -X POST "https://YOUR_API_URL/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "features": "0,24,65.5,1500.0,1,0,1,2,0,0,1,1,0,0,1,0,2,1,1",
    "tenure": 24,
    "monthly_charges": 65.5,
    "contract": "Month-to-month"
}'

Churn API response

Sample response

{
  "churn_probability": "0.6%",
  "risk_level": "Low",
  "explanation": "The customer's high churn probability of 0.6% and the month-to-month contract indicate a significant risk of losing the customer. To mitigate this risk, a retention action could be to offer the customer a longer-term contract with a discounted monthly rate or additional benefits, which may help increase their loyalty and reduce the likelihood of churn."
}

🧹 Cleanup

Delete resources when you’re finished to avoid charges:

# Delete SageMaker endpoint (most expensive!)
aws sagemaker delete-endpoint --endpoint-name churn-prediction-endpoint
aws sagemaker delete-endpoint-config --endpoint-config-name churn-prediction-endpoint

# Delete Lambda
aws lambda delete-function --function-name ChurnPredictionAPI

# Delete S3 bucket
aws s3 rb s3://$BUCKET_NAME --force

💡 Key Lessons Learned

SageMaker XGBoost is production‑ready – achieved ~84 % AUC with minimal tuning.
Bedrock adds real business value – turning raw predictions into actionable insights makes ML accessible to non‑technical stakeholders.
IAM permissions are tricky – creating roles via the console can help avoid “explicit deny” errors.
Cost awareness matters – always delete expensive resources (e.g., SageMaker endpoints) when they’re no longer needed.

Endpoints When Not in Use

(~$0.05/hour adds up!)

Resources

Thanks for reading! If this helped you, follow me for more AWS + Data Engineering content.

Questions? Leave a comment below!

Building an AI-Powered Customer Churn Prediction Pipeline on AWS (Step-by-Step)

What we achieved

🎯 What We’re Building

🏗️ Architecture Overview

Step 1 – Set Up S3 and Upload Data

Step 2 – Create SageMaker IAM Role

Step 3 – Train the Model

Run the script

Sample training output

Step 4 – Create Lambda with Bedrock Integration

Lambda configuration

Step 5 – Create API Gateway

🧪 Test the API

🧹 Cleanup

💡 Key Lessons Learned

Endpoints When Not in Use

Resources

Related posts

The RGB LED Sidequest 💡

Zapier vs. Custom Code: When to Fire Your 'Glue' Tool

Mendex: Why I Build

Why Apache Ozone is the Preferred Object Store for Big Data

What we achieved

🎯 What We’re Building

🏗️ Architecture Overview

Step 1 – Set Up S3 and Upload Data

Step 2 – Create SageMaker IAM Role

Step 3 – Train the Model

Run the script

Sample training output

Step 4 – Create Lambda with Bedrock Integration

Lambda configuration

Step 5 – Create API Gateway

🧪 Test the API

🧹 Cleanup

💡 Key Lessons Learned

Endpoints When Not in Use

Resources

Related posts

The RGB LED Sidequest 💡

Zapier vs. Custom Code: When to Fire Your 'Glue' Tool

Mendex: Why I Build

Why Apache Ozone is the Preferred Object Store for Big Data

Step 4 – Create Lambda with Bedrock Integration

Step 5 – Create API Gateway