Building Clusterflick: A London Cinema Aggregator

Published: (February 6, 2026 at 01:20 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

Overview

I’ve been working on a personal project called Clusterflick — a single source for every movie showing across London. It currently tracks 240 venues across 5 event platforms, pulling in 1,398 events and over 30,000 showings. What began as a simple desire to have cinema times on my calendar quickly evolved into a full data pipeline running on GitHub Actions, a statically generated Next.js site, and a cluster of Raspberry Pis in my living room.

Challenges

  • Movie matching is deceptively hard – title + year or title + director often isn’t enough to uniquely identify a film. Some cinema listings provide too little information for reliable human identification.
  • Scraping at scale without a budget – GitHub runner IPs get blocked, so a Raspberry Pi cluster now handles the trickier sources.
  • Using LLMs for data quality – When fuzzy matching falls short, large language models have proven surprisingly useful for resolving ambiguous movie lookups against The Movie DB.
  • Keeping it cheap – The entire system runs on near‑zero infrastructure costs: GitHub Actions for orchestration, Releases as storage, and static site generation to avoid hosting fees.

Open Source

The whole project is open source on GitHub. If any of this sounds interesting, I’d love to hear from others working on similar scraping, aggregation, or data‑pipeline projects.

Back to Blog

Related posts

Read more »

API Gateway vs Gateway API

API Gateway An API Gateway is a central entry point for all client requests, acting as a reverse proxy that routes them to the appropriate backend microservice...