Fixing the Ghost Bus Problem: How Weighted Averages Can Improve Real-Time Public Transit Predictions
Source: Dev.to
The Data Problem
Many users (myself included) have noticed a recurring issue with the Belfast bus system—timing discrepancies and “ghost buses” that don’t show up. Buses often arrive too early, too late, or not at all. This issue has been ongoing since late December 2023, with over 10,000 metro services cancelled or missing in 12 months, causing frustration among commuters.
The Minister for Transport has even commented that “ghost buses” are “simply not acceptable.”
Static timetables don’t align with reality. According to reports, 2,500 complaints have been made to Translink, with passengers frustrated by buses that “disappear” from digital displays or app schedules that don’t reflect cancellations. These discrepancies lead to wasted time, missed connections, and decreased trust in public transport.
The solution? A real‑time arrival prediction model that uses historical data and user‑reported events, rather than relying solely on static schedules.
Enter Weighted Averages
In statistics, weighted averages allow certain data points to be more important than others. This is useful for the Belfast bus system, where the most recent data is likely the most accurate predictor of future events. A weighted average works by assigning a weight (or “confidence score”) to each data point and adjusting predictions based on those weights. In this case, more recent bus arrival times receive higher weights because they reflect the latest, most relevant information.
How It Works in Practice
Example Data
We have the following bus arrival times:
- 08:02
- 08:05
- 08:07
- 08:10
- 08:12
We assign weights (newer data gets higher weight):
- 08:12 = 5
- 08:10 = 4
- 08:07 = 3
- 08:05 = 2
- 08:02 = 1
Calculation
# Convert times to minutes since midnight
08:12 = 492 minutes
08:10 = 490 minutes
08:07 = 487 minutes
08:05 = 485 minutes
08:02 = 482 minutes
# Multiply each time by its weight
08:12 (492) × 5 = 2460
08:10 (490) × 4 = 1960
08:07 (487) × 3 = 1461
08:05 (485) × 2 = 970
08:02 (482) × 1 = 482
# Sum of weighted times
Total = 2460 + 1960 + 1461 + 970 + 482 = 7333
# Sum of weights
Weight total = 5 + 4 + 3 + 2 + 1 = 15
# Weighted average (minutes past midnight)
7333 ÷ 15 = 488.87 minutes ≈ 08:08:52 (rounded to 08:09)
The weighted average gives a more accurate prediction of the next bus arrival time, based on real‑time performance rather than outdated static schedules.
How This Solves the Ghost Bus Problem
By using weighted averages to calculate real‑time predictions, we move away from static timetables and create dynamic predictions that react to actual bus performance. Every time a bus arrives, that data point is added to the system, which then recalculates the expected time for the next bus. Recent arrivals have higher weight, so the predicted arrival time always reflects the latest performance.
For passengers, this means no more waiting in the rain unsure when (or if) their bus will arrive. Instead, they receive a live ETA based on actual performance, not just scheduled times.
Why This Matters for Developers
The concept of weighted averages is straightforward, but its real‑world application in transport systems demonstrates the power of data‑driven predictions. In a public‑transit scenario, relying on historical performance (rather than static timetables) makes a huge difference in delivering a better user experience.
The model can be further enhanced with machine learning to predict delays, cancellations, and bus frequency, factoring in variables such as traffic patterns, weather conditions, or driver availability.
Developers working on similar projects can apply this dynamic prediction approach to any system that relies on real‑time data (e.g., ride‑sharing, delivery services, public‑transit apps).
TL;DR
To fix the ghost bus problem in Belfast, we use weighted averages to predict bus arrival times dynamically. By giving more weight to recent bus data, we move away from static timetables and create more accurate, real‑time predictions, ultimately improving the passenger experience.
Closing Thoughts
This solution illustrates a simple but effective way of using historical performance data to improve real‑time predictions. It also opens up opportunities to build smarter systems that adapt to the real world, helping developers address similar challenges across various industries.
Check the project out here.