Building an AI-Powered Customer Churn Prediction Pipeline on AWS (Step-by-Step)

Published: (December 31, 2025 at 07:56 PM EST)
5 min read
Source: Dev.to

Source: Dev.to

What we achieved

  • 84.2% AUC on validation data
  • ✅ Real‑time predictions via SageMaker endpoint
  • ✅ Natural language explanations powered by Claude (Bedrock)

🎯 What We’re Building

An end‑to‑end ML pipeline that:

StepAction
IngestsCustomer data into S3
TrainsA churn prediction model with SageMaker XGBoost
DeploysA real‑time inference endpoint
ExplainsPredictions using Amazon Bedrock (Claude)
ExposesEverything via API Gateway + Lambda

Prerequisites: AWS account, basic Python knowledge

🏗️ Architecture Overview

AWS Churn Prediction Architecture

The pipeline consists of 5 tiers

TierServicesPurpose
Data IngestionS3Store raw customer data
ML TrainingSageMaker TrainingTrain XGBoost model
Model StorageS3Store model artifacts
Inference & AISageMaker Endpoint, BedrockReal‑time predictions + NL explanations
API LayerAPI Gateway, LambdaExpose REST API

Step 1 – Set Up S3 and Upload Data

# Set bucket name with your account ID
export BUCKET_NAME=churn-prediction-$(aws sts get-caller-identity --query Account --output text)

# Create bucket
aws s3 mb s3://$BUCKET_NAME

# Upload your data
aws s3 cp WA_Fn-UseC_-Telco-Customer-Churn.csv s3://$BUCKET_NAME/raw/

📥 Dataset: Download the Telco Customer Churn dataset from Kaggle.

Step 2 – Create SageMaker IAM Role

  1. Open the IAM console → RolesCreate role.
  2. Choose SageMaker – Execution as the trusted entity.
  3. Attach the policies: AmazonSageMakerFullAccess and AmazonS3FullAccess.
  4. Name the role SageMakerChurnRole.

Step 3 – Train the Model

Create a file named train_churn.py with the following content:

import boto3
import sagemaker
import pandas as pd
import os
from sklearn.model_selection import train_test_split
from sagemaker.inputs import TrainingInput

# Config
BUCKET = os.environ['BUCKET_NAME']
ROLE   = os.environ['ROLE_ARN']
PREFIX = 'churn-prediction'

session = sagemaker.Session()
region  = session.boto_region_name

# Load and prepare data
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce').fillna(0)
df['Churn'] = (df['Churn'] == 'Yes').astype(int)

# Encode categorical columns
cat_cols = [
    'gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines',
    'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
    'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract',
    'PaperlessBilling', 'PaymentMethod'
]

for col in cat_cols:
    df[col] = df[col].astype('category').cat.codes

# Feature set
feature_cols = ['SeniorCitizen', 'tenure', 'MonthlyCharges', 'TotalCharges'] + cat_cols
X = df[feature_cols]
y = df['Churn']

# Train‑test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

train_df = pd.concat([y_train.reset_index(drop=True), X_train.reset_index(drop=True)], axis=1)
test_df  = pd.concat([y_test.reset_index(drop=True),  X_test.reset_index(drop=True)], axis=1)

train_df.to_csv('train.csv', index=False, header=False)
test_df.to_csv('test.csv',   index=False, header=False)

# Upload CSVs to S3
s3 = boto3.client('s3')
s3.upload_file('train.csv', BUCKET, f'{PREFIX}/train/train.csv')
s3.upload_file('test.csv',  BUCKET, f'{PREFIX}/test/test.csv')

# XGBoost container
container = sagemaker.image_uris.retrieve('xgboost', region, '1.7-1')

xgb = sagemaker.estimator.Estimator(
    image_uri=container,
    role=ROLE,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path=f's3://{BUCKET}/{PREFIX}/output',
    sagemaker_session=session
)

xgb.set_hyperparameters(
    objective='binary:logistic',
    num_round=100,
    max_depth=5,
    eta=0.2,
    eval_metric='auc'
)

xgb.fit({
    'train': TrainingInput(f's3://{BUCKET}/{PREFIX}/train', content_type='csv'),
    'validation': TrainingInput(f's3://{BUCKET}/{PREFIX}/test', content_type='csv')
})

# Deploy endpoint
predictor = xgb.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium',
    endpoint_name='churn-prediction-endpoint',
    serializer=sagemaker.serializers.CSVSerializer()
)

Run the script

export BUCKET_NAME=churn-prediction-YOUR_ACCOUNT_ID
export ROLE_ARN=arn:aws:iam::YOUR_ACCOUNT_ID:role/SageMakerChurnRole
python3 train_churn.py

Sample training output

2026-01-01 00:24:27 Uploading - Uploading generated training model
2026-01-01 00:24:27 Completed - Training job completed
Training seconds: 103
Billable seconds: 103

✅ Training complete!
Model artifact: s3://churn-prediction-905418352184/churn-prediction/output/sagemaker-xgboost-2026-01-01-00-22-03-339/output/model.tar.gz

Deploying endpoint (3‑5 min)...
INFO:sagemaker:Creating model with name: sagemaker-xgboost-2026-01-01-00-24-53-959
INFO:sagemaker:Creating endpoint-config with name churn-prediction-endpoint
INFO:sagemaker:Creating endpoint with name churn-prediction-endpoint
✅ Endpoint deployed: churn-prediction-endpoint
Test prediction: 0.4% churn probability

You now have a fully functional churn‑prediction model, a real‑time SageMaker endpoint, and (in the next steps) an API layer that calls Bedrock to generate plain‑English explanations for each prediction. Stay tuned for the remaining parts of the tutorial!

Step 4 – Create Lambda with Bedrock Integration

Create a Lambda function ChurnPredictionAPI with the following code:

import json
import boto3
import os

sagemaker_runtime = boto3.client('sagemaker-runtime')
bedrock = boto3.client('bedrock-runtime')

ENDPOINT_NAME = os.environ.get('SAGEMAKER_ENDPOINT', 'churn-prediction-endpoint')

def lambda_handler(event, context):
    body = json.loads(event['body']) if isinstance(event.get('body'), str) else event

    # Get prediction from SageMaker
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=ENDPOINT_NAME,
        ContentType='text/csv',
        Body=body['features']
    )
    churn_prob = float(response['Body'].read().decode())

    # Generate explanation with Bedrock Claude
    prompt = f"""A customer has {churn_prob:.1%} churn probability.
Customer: Tenure {body.get('tenure', 'N/A')} months, ${body.get('monthly_charges', 'N/A')}/month, {body.get('contract', 'N/A')} contract.
In 2 sentences, explain the risk and suggest one retention action."""

    bedrock_response = bedrock.invoke_model(
        modelId='anthropic.claude-3-haiku-20240307-v1:0',
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 100,
            "messages": [{"role": "user", "content": prompt}]
        })
    )
    explanation = json.loads(bedrock_response['body'].read())['content'][0]['text']
    risk = "High" if churn_prob > 0.7 else "Medium" if churn_prob > 0.4 else "Low"

    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps({
            'churn_probability': f"{churn_prob:.1%}",
            'risk_level': risk,
            'explanation': explanation
        })
    }

Test Successful

Lambda configuration

  • Runtime: Python 3.11
  • Timeout: 30 seconds
  • Role: LambdaChurnRole (with SageMaker + Bedrock permissions)
  • Environment variable: SAGEMAKER_ENDPOINT=churn-prediction-endpoint

Step 5 – Create API Gateway

  1. Create an HTTP API in API Gateway.
  2. Add a Lambda integration → ChurnPredictionAPI.
  3. Create a POST route: /predict.
  4. Deploy and note the invoke URL.

Create API Gateway
Configure Integration

🧪 Test the API

curl -X POST "https://YOUR_API_URL/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "features": "0,24,65.5,1500.0,1,0,1,2,0,0,1,1,0,0,1,0,2,1,1",
    "tenure": 24,
    "monthly_charges": 65.5,
    "contract": "Month-to-month"
}'

Churn API response

Sample response

{
  "churn_probability": "0.6%",
  "risk_level": "Low",
  "explanation": "The customer's high churn probability of 0.6% and the month-to-month contract indicate a significant risk of losing the customer. To mitigate this risk, a retention action could be to offer the customer a longer-term contract with a discounted monthly rate or additional benefits, which may help increase their loyalty and reduce the likelihood of churn."
}

🧹 Cleanup

Delete resources when you’re finished to avoid charges:

# Delete SageMaker endpoint (most expensive!)
aws sagemaker delete-endpoint --endpoint-name churn-prediction-endpoint
aws sagemaker delete-endpoint-config --endpoint-config-name churn-prediction-endpoint

# Delete Lambda
aws lambda delete-function --function-name ChurnPredictionAPI

# Delete S3 bucket
aws s3 rb s3://$BUCKET_NAME --force

💡 Key Lessons Learned

  • SageMaker XGBoost is production‑ready – achieved ~84 % AUC with minimal tuning.
  • Bedrock adds real business value – turning raw predictions into actionable insights makes ML accessible to non‑technical stakeholders.
  • IAM permissions are tricky – creating roles via the console can help avoid “explicit deny” errors.
  • Cost awareness matters – always delete expensive resources (e.g., SageMaker endpoints) when they’re no longer needed.

Endpoints When Not in Use

(~$0.05/hour adds up!)

Resources

Thanks for reading! If this helped you, follow me for more AWS + Data Engineering content.

Questions? Leave a comment below!

Back to Blog

Related posts

Read more »

The RGB LED Sidequest 💡

markdown !Jennifer Davishttps://media2.dev.to/dynamic/image/width=50,height=50,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%...

Mendex: Why I Build

Introduction Hello everyone. Today I want to share who I am, what I'm building, and why. Early Career and Burnout I started my career as a developer 17 years a...