[Paper] Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies
Source: arXiv - 2606.07492v1
Overview
The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms. Naive aggregation of performance metrics (e.g., averaging NDCG over benchmarks) can yield misleading rankings, undermining practical selection. To address this problem, we introduce a novel, data-driven ranking methodology based on Bradley-Terry (BT) model. We demonstrate that the obtained ranking depends on key dataset statistics. Additionally, we propose a novel metric for evaluating ranking consistency and demonstrate robustness of our ranking to incomplete data. Finally, we introduce a dataset-specific methodology for ranking algorithms on unseen datasets without running the models, relying on extensions of the Bradley-Terry framework, including BT trees and BT models with covariates.
Key Contributions
This paper presents research in the following areas:
- cs.IR
- cs.LG
- stat.ML
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.IR.
Authors
- Ekaterina Grishina
- Stepan Kuznetsov
- Askar Tsyganov
- Ilya Ivanov
- Daria Korovaitceva
- Margarita Rusanova
- Uliana Parkina
- Alexander Derevyagin
- Evgeny Frolov
- Sergey Samsonov
- Anton Lysenko
Paper Information
- arXiv ID: 2606.07492v1
- Categories: cs.IR, cs.LG, stat.ML
- Published: June 5, 2026
- PDF: Download PDF