[Paper] PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

Published: 2 days ago (March 1, 2026 at 11:21 AM EST)

6 min read

Source: arXiv

Source: arXiv - 2603.01171v1

Overview

The paper introduces PARWiS, an algorithm that picks the best item from a set by asking only a handful of pairwise comparison questions. By cleverly choosing which pairs to compare and using a spectral‑ranking step, PARWiS can recover the true winner even when the budget is as low as 40–80 comparisons for 20 items. The authors also extend the core idea with two variants—Contextual PARWiS (using side information) and RL PARWiS (reinforcement‑learning‑driven pair selection)—and benchmark them against popular baselines on synthetic data and two real‑world recommendation datasets (Jester and MovieLens).

Key Contributions

PARWiS algorithm: combines disruptive pair selection with a spectral ranking technique to maximize information gain per comparison.
Contextual PARWiS: integrates item‑level contextual features (e.g., user demographics, item metadata) into the pair‑selection policy.
RL PARWiS: frames pair selection as a reinforcement‑learning problem, learning a policy that adapts to the observed comparison outcomes.
Comprehensive empirical evaluation: compares all three variants against Double Thompson Sampling and a random‑selection baseline on both synthetic and real datasets, using multiple budget levels (40, 60, 80).
New performance metrics for low‑budget winner determination: recovery fraction, true rank of reported winner, reported rank of true winner, cumulative regret, and the separation metric Δ₁,₂ (gap between the top two items).

Methodology

Problem setting – We have n items (n = 20 in the experiments). The learner can ask a limited number B of pairwise comparison queries (e.g., “Is item i preferred to item j?”). The goal is to output the item that truly has the highest latent utility.
Disruptive pair selection – Instead of random or round‑robin queries, PARWiS selects pairs that are most likely to change the current ranking. It does this by maintaining a provisional spectral ranking (the eigenvector of a comparison matrix) and then picking the pair whose comparison would cause the largest shift in that eigenvector.
Spectral ranking – After each new comparison, the algorithm updates a weighted adjacency matrix where entry (i, j) stores the number of times i beat j. The leading left eigenvector of the normalized matrix serves as a global ranking estimate (similar to PageRank).
Contextual extension – When side information xᵢ is available for each item, Contextual PARWiS augments the pair‑selection score with a linear model that predicts the utility gap from the features, biasing the algorithm toward pairs that are informative given the context.
RL extension – RL PARWiS treats each step as a Markov decision process:
- State = current comparison matrix + context
- Action = which pair to query
- Reward = reduction in uncertainty (or negative regret)
A policy network (e.g., a small feed‑forward DNN) is trained online with policy‑gradient updates.
Baselines
- Double Thompson Sampling (DTS): a Bayesian bandit approach that samples two items from posterior utilities and compares them.
- Random: uniformly pick a pair at each step.
Evaluation protocol – For each dataset and budget, the algorithms run multiple Monte‑Carlo trials. Metrics are computed after the budget is exhausted, focusing on how close the reported winner is to the true top‑ranked item and how much cumulative regret was incurred.

Results & Findings

Dataset	Budget	Metric (higher = better)	PARWiS	RL PARWiS	Contextual PARWiS	DTS	Random
Jester (Δ₁,₂ large)	40	Recovery fraction	0.84	0.81	0.80	0.62	0.45
	80	Cumulative regret (lower = better)	0.31	0.33	0.34	0.48	0.71
MovieLens (Δ₁,₂ small)	40	True rank of reported winner	3.2	3.4	3.5	5.1	7.8
	80	Reported rank of true winner	1.9	2.0	2.1	3.4	5.6

Take‑aways

PARWiS and RL PARWiS consistently beat the baselines across all budgets.
The advantage is most pronounced when the top‑two items are well separated (large Δ₁,₂), as in Jester.
In the more ambiguous MovieLens setting (small Δ₁,₂), the gap narrows but PARWiS still leads.
Contextual PARWiS performs on par with vanilla PARWiS; the added features did not translate into a clear win, suggesting the need for better feature engineering or richer models.

Practical Implications

Recommendation engines can use PARWiS to quickly surface the most promising item (e.g., a new product, a trending article) by asking a few “A‑vs‑B” questions to users or crowd‑workers, saving costly labeling effort.
A/B testing pipelines can replace exhaustive pairwise testing with PARWiS‑driven sampling, achieving statistically comparable conclusions with far fewer experiments.
Human‑in‑the‑loop ML: when annotators are expensive (e.g., medical image ranking), PARWiS tells you which image pairs to present next to maximize diagnostic insight per annotation.
RL PARWiS opens the door to adaptive, data‑driven query policies that improve over time, useful for platforms that continuously collect preference data (e.g., music streaming services).
The spectral ranking core is lightweight (matrix eigen‑decomposition on a 20 × 20 matrix is trivial) and can be embedded directly into micro‑services without heavy GPU requirements.

Limitations & Future Work

Scalability: The current experiments fix n = 20. Extending PARWiS to hundreds or thousands of items will require sparse matrix tricks or hierarchical grouping.
Contextual feature utilization: The contextual variant did not yield a measurable boost, indicating that either the chosen features were not informative enough or the linear model was too simple. More expressive models (e.g., graph neural nets) could be explored.
Reinforcement learning overhead: RL PARWiS adds policy‑network training, which may be overkill for very low‑budget scenarios; a hybrid that falls back to the deterministic pair selector when data is scarce could be more efficient.
Assumption of static utilities: The paper assumes item utilities do not drift during the budget. Real‑world systems often face non‑stationary preferences; incorporating change‑point detection or online adaptation would be valuable.

Bottom line: PARWiS demonstrates that with smart pair selection and a simple spectral ranking step, you can reliably pick the best item even when you can only afford a few dozen comparisons. For developers building preference‑driven products, it offers a practical, low‑cost alternative to exhaustive ranking or heavyweight Bayesian bandits.

Authors

Shailendra Bhandari

Paper Information

arXiv ID: 2603.01171v1
Categories: cs.LG, cs.CC, cs.NE
Published: March 1, 2026
PDF: Download PDF

[Paper] PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Partial Causal Structure Learning for Valid Selective Conformal Inference under Interventions

[Paper] Tool Verification for Test-Time Reinforcement Learning

[Paper] Frontier Models Can Take Actions at Low Probabilities

[Paper] Adaptive Confidence Regularization for Multimodal Failure Detection