[Paper] Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data

Published: 2 days ago (January 29, 2026 at 01:56 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.22141v1

Overview

The paper introduces Routing the Lottery (RTL), a new pruning framework that moves beyond the classic “one‑size‑fits‑all” lottery ticket hypothesis. Instead of searching for a single sparse subnetwork that works for every input, RTL learns a portfolio of adaptive tickets—each specialized for a particular class, semantic cluster, or environmental condition. The result is a modular, context‑aware model that delivers higher accuracy with dramatically fewer parameters.

Key Contributions

Adaptive tickets: A method to discover multiple, data‑dependent sparse subnetworks rather than a universal one.
Routing mechanism: A lightweight selector that routes each input to its most suitable ticket at inference time.
Subnetwork collapse analysis: Identification of a failure mode where aggressive pruning causes tickets to lose discriminative power.
Subnetwork similarity score: A label‑free metric that flags oversparsification before performance degrades.
Empirical gains: Across image classification, object detection, and domain‑shift benchmarks, RTL achieves up to 10× parameter reduction compared with training separate models, while improving balanced accuracy and recall.

Methodology

Base network & initial pruning: Start with a dense backbone (e.g., ResNet‑50) and apply magnitude‑based pruning to obtain an initial sparse mask.
Ticket diversification: Using a small clustering step on either class labels or learned feature embeddings, the data is split into K groups (e.g., per class or per domain). For each group, RTL fine‑tunes a separate mask while keeping the shared backbone weights frozen. This yields K adaptive tickets that differ mainly in which connections are kept.
Routing module: A shallow gating network (often a single linear layer followed by softmax) takes the same input and predicts which ticket should process it. The routing decision is trained jointly with the tickets using a cross‑entropy loss plus a sparsity regularizer.
Training loop:
- Forward pass → routing → selected ticket → loss.
- Back‑prop updates both the routing parameters and the mask scores for the active ticket.
- Periodically, masks are binarized (0/1) based on a global sparsity budget.
Diagnosis tools: The subnetwork similarity score computes pairwise overlap of binary masks; a sudden drop signals subnetwork collapse, prompting a relaxation of the sparsity target.

The whole pipeline is compatible with standard deep‑learning libraries and adds only a modest overhead (the routing net is <1 % of total FLOPs).

Results & Findings

Dataset / Task	Baseline (single ticket)	RTL (K=5)	Parameter Savings
CIFAR‑100 (classification)	73.2 % acc	77.8 % acc	9.3× fewer params
Cityscapes (semantic seg.)	71.5 % mIoU	74.2 % mIoU	7.8× fewer params
DomainNet (multi‑domain)	62.1 % avg acc	66.4 % avg acc	10.2× fewer params

Balanced accuracy improves especially on under‑represented classes, indicating that tickets specialize to capture minority patterns.
Recall gains are consistent across tasks, showing that RTL reduces false negatives caused by over‑pruning.
The subnetwork similarity score successfully predicts collapse: when the score falls below a learned threshold, early‑stopping or sparsity relaxation restores performance.

Practical Implications

Edge & mobile deployment: Developers can ship a single compact model that dynamically activates the appropriate ticket, avoiding the storage and maintenance cost of multiple specialized models.
Continual learning & domain adaptation: New tickets can be added for emerging data clusters without retraining the entire network, facilitating modular updates.
Interpretability: Since tickets align with semantic groups, engineers can inspect which parts of the network are responsible for specific classes or conditions, aiding debugging and fairness audits.
Resource‑aware inference: The routing decision can be conditioned on device constraints (e.g., low‑power mode) to select a lighter ticket, offering graceful degradation.

Limitations & Future Work

Routing overhead: Although small, the routing network adds latency; scaling to thousands of tickets may require more efficient selectors.
Cluster definition: RTL relies on a reasonable grouping of data; poor clustering can lead to redundant tickets or suboptimal specialization.
Training stability: Joint optimization of masks and routing can be sensitive to hyper‑parameters, especially the sparsity schedule.
Future directions: The authors suggest exploring hierarchical routing (coarse‑to‑fine ticket selection), integrating RTL with neural architecture search, and extending the similarity diagnostics to unsupervised settings.

Routing the Lottery reframes pruning from a static compression technique into a dynamic, data‑aware strategy—opening the door for more modular, efficient, and adaptable deep‑learning systems in production environments.

Authors

Grzegorz Stefanski
Alberto Presta
Michal Byra

Paper Information

arXiv ID: 2601.22141v1
Categories: cs.AI, cs.CV, cs.LG
Published: January 29, 2026
PDF: Download PDF

[Paper] Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers

[Paper] RedSage: A Cybersecurity Generalist LLM

[Paper] One-step Latent-free Image Generation with Pixel Mean Flows

[Paper] Discovering Hidden Gems in Model Repositories