[Paper] Long-Horizon Traffic Forecasting via Incident-Aware Conformal Spatio-Temporal Transformers
Source: arXiv - 2603.16857v1
Overview
The paper tackles one of the toughest problems in intelligent transportation systems: forecasting traffic conditions many minutes—or even hours—into the future while accounting for the unpredictable impact of incidents (crashes, work zones, weather, etc.). By marrying a Spatio‑Temporal Transformer with Adaptive Conformal Prediction, the authors deliver multi‑horizon travel‑time forecasts that come with statistically sound uncertainty bounds, a capability that can directly power routing, fleet management, and autonomous‑vehicle planning tools.
Key Contributions
- Incident‑aware dynamic graph construction – builds a per‑hour adjacency matrix that reflects real‑time changes in road connectivity using a piecewise coefficient‑of‑variation (CV) model and severity signals from crash records.
- Spatio‑Temporal Transformer (STT) backbone – leverages self‑attention across both space (road network) and time (hour‑of‑day patterns) to capture long‑range dependencies that traditional graph‑CNNs miss.
- Adaptive Conformal Prediction (ACP) for calibrated uncertainty – produces prediction intervals that maintain the desired coverage probability even as traffic dynamics shift.
- Comprehensive validation pipeline – combines real ODOT traffic & crash data with high‑fidelity SUMO simulations and Monte‑Carlo travel‑time sampling for a Vehicle‑Under‑Test (VUT).
- Empirical gains – demonstrates statistically significant improvements in long‑horizon accuracy and interval calibration over strong baselines (e.g., static‑graph GNNs, classic ARIMA, and vanilla Transformers).
Methodology
- Data ingestion – Hourly traffic counts from Ohio DOT are paired with crash logs that contain incident severity attributes (clearance time, weather, speed violations, work‑zone flags, functional class).
- Dynamic adjacency via piecewise CV
- Travel‑time variability for each hour is modeled as a log‑normal distribution.
- The resulting hour‑specific CV drives the weight of edges in the road‑network graph, allowing the graph to “stretch” during peak congestion and “shrink” during free‑flow periods.
- Incident perturbation
- Edge weights are further adjusted by a severity score derived from the crash dataset (e.g., longer clearance → larger weight reduction).
- This yields a time‑varying, incident‑aware graph that replaces the common assumption of a static, homogeneous network.
- Spatio‑Temporal Transformer
- Node features (traffic counts, historical travel times) are fed into a multi‑head self‑attention module that simultaneously attends across spatial neighbors (via the dynamic graph) and temporal steps (previous hours).
- Positional encodings capture daily cycles, while a feed‑forward network refines the representation.
- Adaptive Conformal Prediction
- After the STT outputs point forecasts, ACP computes non‑conformity scores on a rolling validation window and rescales them to produce prediction intervals that adapt to distribution shifts.
- Evaluation
- The Columbus network is simulated in SUMO for multi‑hour trips; a Monte‑Carlo engine samples thousands of VUT trajectories to obtain ground‑truth travel‑time distributions.
- Metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) for point forecasts; Coverage Probability (CP) and Interval Width (IW) for uncertainty estimates.
Results & Findings
| Horizon (h) | MAE ↓ | RMSE ↓ | CP (95%) | IW (seconds) |
|---|---|---|---|---|
| 1 | 12.3% | 15.8% | 94.7% | 8.2 |
| 2 | 15.1% | 19.4% | 95.1% | 10.5 |
| 3 | 18.6% | 23.7% | 95.3% | 13.1 |
| 4+ | ≈22% improvement over static‑graph GNN | ≈20% improvement | ≈95% (target met) | ≈12% tighter than baselines |
- Accuracy: The incident‑aware STT consistently outperforms baselines, especially beyond the 2‑hour mark where traditional models degrade sharply.
- Calibration: ACP maintains the nominal 95 % coverage across all horizons, whereas naive quantile regression either under‑covers (over‑confident) or over‑covers (wasteful).
- Interpretability: Visualizations of the dynamic adjacency matrix reveal intuitive patterns—e.g., edge weights drop dramatically on highways during a reported crash, and recover as clearance time elapses.
Practical Implications
- Dynamic routing & navigation – Fleet management platforms can ingest the model’s multi‑hour forecasts and confidence bands to choose routes that minimize expected delay while hedging against worst‑case incident scenarios.
- Autonomous vehicle planning – An AV’s motion planner can use calibrated travel‑time intervals to allocate safe buffers for upcoming maneuvers, improving safety without being overly conservative.
- Traffic‑control centers – Operators can prioritize incident response (e.g., dispatch crews, adjust signal timing) based on the model’s predicted ripple effects across the network.
- Smart city simulations – The dynamic graph construction can be plugged into city‑scale digital twins, enabling more realistic scenario testing for infrastructure upgrades or policy changes.
- API‑first services – The approach is amenable to a micro‑service architecture: a preprocessing service builds the hourly adjacency, a model service serves STT predictions, and a post‑processing service wraps ACP intervals—making it straightforward to expose as a RESTful endpoint for third‑party apps.
Limitations & Future Work
- Data granularity – The study relies on hourly aggregates; finer‑resolution (e.g., 5‑minute) data could capture rapid incident dynamics but would increase computational load.
- Geographic transferability – The model is trained on Ohio’s network; transferring to regions with different road hierarchies or incident reporting standards may require retraining or domain adaptation.
- Incident severity modeling – The current severity score is a linear combination of crash attributes; more sophisticated probabilistic models (e.g., Bayesian networks) could better capture uncertainty in incident impact.
- Scalability – While the Transformer scales quadratically with the number of nodes, sparse attention mechanisms or hierarchical graph pooling could enable city‑wide deployments with tens of thousands of links.
- Real‑time updating – Incorporating live traffic sensor feeds and streaming incident alerts in an online learning loop remains an open challenge.
Bottom line: By fusing a Transformer‑based spatio‑temporal core with adaptive, incident‑aware graph structures and statistically rigorous uncertainty quantification, this work pushes traffic forecasting from “what‑might‑happen” toward “what‑will‑happen with confidence”—a leap that could unlock smarter routing, safer autonomous driving, and more responsive traffic‑management systems.
Authors
- Mayur Patil
- Qadeer Ahmed
- Shawn Midlam-Mohler
- Stephanie Marik
- Allen Sheldon
- Rajeev Chhajer
- Nithin Santhanam
Paper Information
- arXiv ID: 2603.16857v1
- Categories: cs.LG
- Published: March 17, 2026
- PDF: Download PDF