[Paper] A Machine Learning Framework for Off Ball Defensive Role and Performance Evaluation in Football
Source: arXiv - 2601.00748v1
Overview
The paper presents a new machine‑learning pipeline for quantifying how football defenders perform when they’re off the ball—especially during corner‑kick situations. By combining a covariate‑dependent Hidden Markov Model (CDHMM) with player‑tracking data, the authors can automatically infer who is marking whom (man‑mark vs. zonal) and then assign defensive credit in a way that respects the tactical context of each play.
Key Contributions
- Covariate‑dependent HMM for defensive role detection – a label‑free model that learns time‑varying man‑marking and zonal assignments directly from tracking data.
- Role‑conditioned ghosting framework – a counterfactual simulation that replaces a defender’s actual movement with a “ghost” that follows the average behavior of the same defensive role, enabling fair performance comparison.
- Defensive credit attribution metric – a novel way to quantify off‑ball defensive impact by measuring how much a defender reduces the opponent’s expected possession value (EPV) relative to the role‑conditioned baseline.
- Focused application to corner kicks – leveraging the highly structured nature of set‑pieces to validate the approach and demonstrate interpretability.
- Open‑source implementation – the authors release code and processed datasets to encourage reproducibility and further research.
Methodology
- Data collection – high‑frequency (10 Hz) player‑tracking and event data from professional matches, with a focus on corner‑kick episodes.
- Feature engineering – each defender’s observable covariates (position, velocity, distance to opponent, angle to goal, etc.) are fed into the model.
- Covariate‑dependent HMM (CDHMM) – unlike a standard HMM with fixed transition probabilities, the CDHMM conditions its state transitions on the current covariates, allowing the model to capture tactical shifts (e.g., switching from zonal to man‑marking as the ball is delivered). The hidden states correspond to defensive roles.
- Inference – the Viterbi algorithm (adapted for covariate dependence) yields the most likely sequence of defensive roles for every player throughout the corner‑kick.
- Ghosting simulation – for each defender, a “ghost” trajectory is generated by sampling from the learned role‑specific movement distribution, effectively showing what would have happened if the defender behaved like a typical player in that role.
- Credit attribution – the difference in EPV (computed via a standard possession‑value model) between the real and ghosted scenarios quantifies the defender’s off‑ball contribution.
Results & Findings
- Role detection accuracy – the CDHMM matches manually annotated defensive assignments with > 85 % agreement, despite being trained without any labels.
- Interpretability – visualizations of inferred role sequences align with known tactical patterns (e.g., a defender staying zonal until the ball is near, then switching to man‑mark).
- Defensive credit distribution – the EPV‑based credit metric highlights defenders who consistently shrink the opponent’s scoring probability, even when they never touch the ball.
- Counterfactual validation – ghosted scenarios produce EPV curves that are statistically indistinguishable from those of average defenders in the same role, confirming the baseline’s fairness.
- Case studies – the authors showcase specific corners where a defender’s off‑ball positioning reduced the opponent’s expected goal probability by up to 0.12 EPV units, a tangible impact in tight matches.
Practical Implications
- Coaching & scouting – teams can now evaluate defenders on metrics that reflect their off‑ball discipline, informing recruitment and training focused on positioning rather than just tackles or interceptions.
- Performance dashboards – the framework can be integrated into existing analytics platforms to surface defensive credit alongside traditional on‑ball stats, giving a fuller picture of player value.
- Live‑match insights – with real‑time tracking, the model could flag when a defender deviates from the expected role, allowing coaches to make tactical adjustments on the fly.
- Betting & fantasy sports – more granular defensive metrics enable better player valuation models for markets that currently undervalue off‑ball contributions.
- Generalizable to other set‑pieces – while the study focuses on corners, the CDHMM architecture can be adapted to free‑kicks, throw‑ins, or even open‑play phases where defensive structures matter.
Limitations & Future Work
- Scope limited to corners – the highly structured nature of set‑pieces simplifies role inference; extending to open play will require handling greater variability and noise.
- Dependence on high‑quality tracking – the approach assumes access to precise, high‑frequency positional data, which may not be available for all leagues or lower‑tier competitions.
- Simplified ghosting distribution – the current ghost model samples from a Gaussian approximation of role behavior; richer generative models (e.g., conditional VAEs) could capture subtler tactical nuances.
- Potential bias in EPV model – the defensive credit metric inherits any biases present in the underlying possession‑value model; future work should explore joint learning of EPV and defensive roles.
- User‑friendly tooling – turning the research code into a plug‑and‑play analytics widget for clubs remains an open engineering challenge.
Overall, the paper pushes the frontier of defensive analytics by turning the “invisible” off‑ball work into quantifiable, actionable insights—an advancement that could reshape how teams assess and develop defensive talent.
Authors
- Sean Groom
- Shuo Wang
- Francisco Belo
- Axl Rice
- Liam Anderson
Paper Information
- arXiv ID: 2601.00748v1
- Categories: cs.LG
- Published: January 2, 2026
- PDF: Download PDF