Building a Serverless Data Analytics Pipeline with AWS: Premier League Dashboard

Published: (December 16, 2025 at 10:51 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Inspired by AWS Cookbook by John Culkin & Mike Zazon – Chapter 7: Big Data

My Journey into Data Analytics

While exploring AWS AI/ML services, I realized that artificial intelligence and machine‑learning are fundamentally built upon quality data foundations. This insight led me to step back and master the data‑analytics fundamentals first. What better way than to build a complete server‑less pipeline?

I created a scalable analytics solution using AWS S3, Athena, and QuickSight to analyze Premier League data.

Why Premier League data?

As a passionate football enthusiast, I’m fascinated by the rich statistical narratives that unfold each season. Every match generates meaningful data points—from goals and assists to tactical formations and player‑performance metrics. This abundance of structured, real‑world data makes football analytics an ideal playground for learning data‑engineering concepts while working with something I genuinely care about.

What we’ll build

  • Serverless data storage with Amazon S3
  • SQL querying with Amazon Athena
  • Interactive dashboards with Amazon QuickSight

⚠️ Disclaimer: The Premier League data used in this project is completely fictional and for demonstration purposes only. If you see Manchester City with 150 points or Tottenham actually winning something, that’s just my creative data generation at work! 😄 Please don’t use this for your fantasy‑football decisions—you’ve been warned! For real Premier League data, check the official sources (and prepare for more realistic disappointment).

Architecture Overview

AWS Analytics

This serverless architecture eliminates infrastructure complexity while providing:

  • Scalability – automatic scaling without server management
  • Cost‑efficiency – pay‑per‑query pricing model
  • Speed – query results in seconds

Implementation Highlights

1️⃣ S3 Data Lake Setup

I stored Premier League CSV files in S3, creating a scalable data foundation:

# Create bucket and upload data
aws s3api create-bucket --bucket premier-league-data-$(openssl rand -hex 3)
aws s3 cp data/ s3://your-bucket/raw-data/ --recursive

Data Source

2️⃣ Athena SQL Querying

Created External Tables

Athena’s power lies in querying data directly from S3 without moving it. Here’s how I created the tables:

-- Standings table
CREATE EXTERNAL TABLE IF NOT EXISTS standings (
    team_name        STRING,
    matches_played   INT,
    wins             INT,
    draws            INT,
    losses           INT,
    goals_for        INT,
    goals_against    INT,
    goal_difference  INT,
    points           INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://your-bucket/raw-data/'
TBLPROPERTIES ('skip.header.line.count'='1');

-- Match results table
CREATE EXTERNAL TABLE IF NOT EXISTS match_results (
    team_name   STRING,
    result_type STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://your-bucket/raw-data/'
TBLPROPERTIES ('skip.header.line.count'='1');

Athena Tables

Sample Analytics Queries

-- Top 6 teams analysis
SELECT team_name, points, goal_difference
FROM   standings
ORDER BY points DESC
LIMIT 6;

Top 6 teams

-- Win‑percentage calculation
SELECT team_name,
       ROUND((wins * 100.0 / matches_played), 2) AS win_percentage
FROM   standings
ORDER BY win_percentage DESC;

Win Percentage

-- Verify data integrity
SELECT * FROM standings;
SELECT * FROM match_results;

Standings‑results

3️⃣ QuickSight Dashboard (Brief Overview)

  1. Connect QuickSight to the Athena data source.
  2. Import the standings and match_results tables.
  3. Build visualisations such as:
    • Bar chart of points per team.
    • Heat‑map of win percentages.
    • Line chart showing points progression over the season.

The resulting dashboard updates automatically as new CSV files land in the S3 bucket and Athena refreshes its metadata.

Takeaways

  • Serverless services (S3 + Athena + QuickSight) let you focus on data, not infrastructure.
  • Athena provides instant, pay‑per‑query analytics on data stored in S3.
  • QuickSight turns query results into interactive visualisations with virtually no operational overhead.

Give it a try—swap the football data for any CSV‑based dataset, and you’ll have a fully‑managed analytics pipeline ready in minutes!

Key Athena Benefits

  • No data movement required
  • Standard SQL interface
  • Pay only for data scanned ($5/TB)
  • Results in seconds

QuickSight Dashboards

Built interactive visualizations, including:

  • League standings table
  • Points comparison charts
  • Goal‑difference analysis
  • Team performance metrics

QuickSight Dashboard

Business Value for Management

QuickSight delivers immediate ROI through:

  • Decision Speed – Real‑time dashboards eliminate waiting for IT reports
  • Cost Savings$9/user vs $70+ for traditional BI tools (e.g., Tableau)
  • Self‑Service Analytics – Business users create their own insights without technical dependencies
  • Mobile Access – Executive dashboards available anywhere, anytime
  • Scalability – Handles 10 users or 10 000 users with the same architecture
  • Security – Enterprise‑grade AWS security and compliance built‑in

What QuickSight can do
Image source: amazon.com QuickSight page

Management Benefits

  • Reduce reporting cycle from weeks to minutes
  • Democratize data access across all departments
  • Lower total cost of ownership by 60‑80 % vs traditional solutions
  • Eliminate server maintenance and upgrade costs

Results & Insights

Cost Breakdown

  • S3 Storage: ~$0.05/month
  • Athena Queries: ~$0.25/month
  • QuickSight: $9/user/month

Total: ~$9.30/month for enterprise‑grade analytics!

Key Learnings

  • ✅ Setup completed in under 2 hours
  • ✅ Serverless = zero infrastructure management
  • ✅ SQL familiarity accelerated development
  • ⚠️ QuickSight permissions required for initial troubleshooting

Next Steps

This foundation opens doors to:

  • Real‑time data integration
  • Machine‑learning predictions
  • Advanced ETL pipelines with AWS Glue

Final Reflections

Starting with data fundamentals before diving into AI/ML proved invaluable. This serverless analytics pipeline demonstrates that powerful data solutions don’t require complex infrastructure—just the right AWS services working together.

The S3 + Athena + QuickSight combination delivers enterprise‑grade analytics at startup costs, making it perfect for both learning and production use cases.

Resources

Building your own data pipeline? Connect with me on LinkedIn – I’d love to hear about your experience!

Back to Blog

Related posts

Read more »