Building Clusterflick: A London Cinema Aggregator
Source: Dev.to
Overview
I’ve been working on a personal project called Clusterflick — a single source for every movie showing across London. It currently tracks 240 venues across 5 event platforms, pulling in 1,398 events and over 30,000 showings. What began as a simple desire to have cinema times on my calendar quickly evolved into a full data pipeline running on GitHub Actions, a statically generated Next.js site, and a cluster of Raspberry Pis in my living room.
Challenges
- Movie matching is deceptively hard – title + year or title + director often isn’t enough to uniquely identify a film. Some cinema listings provide too little information for reliable human identification.
- Scraping at scale without a budget – GitHub runner IPs get blocked, so a Raspberry Pi cluster now handles the trickier sources.
- Using LLMs for data quality – When fuzzy matching falls short, large language models have proven surprisingly useful for resolving ambiguous movie lookups against The Movie DB.
- Keeping it cheap – The entire system runs on near‑zero infrastructure costs: GitHub Actions for orchestration, Releases as storage, and static site generation to avoid hosting fees.
Open Source
The whole project is open source on GitHub. If any of this sounds interesting, I’d love to hear from others working on similar scraping, aggregation, or data‑pipeline projects.