[Paper] The Optimal Sample Complexity of Multiclass and List Learning

Published: 1 day ago (April 27, 2026 at 01:54 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.24749v1

Overview

The paper settles a decades‑old question about how many labeled examples are truly needed to learn multiclass classifiers (and the related “list learning” setting). By linking the DS dimension—the right complexity measure for multiclass problems—to a concrete combinatorial quantity (maximum hypergraph density), the authors finally close the lingering √DS gap between known upper and lower bounds. In practical terms, this tells us exactly how data‑hungry a multiclass model must be, given its expressive power.

Key Contributions

Proof of the Daniely‑Shalev‑Shwartz conjecture (2014): Shows that the maximum hypergraph density of any multiclass hypothesis class never exceeds its DS dimension.
Tight sample‑complexity bounds: Derives optimal (up to constant factors) upper and lower bounds on the number of training examples needed for both multiclass classification and list learning, expressed solely in terms of the DS dimension.
Algebraic characterization extension: Builds on the recent Hanneke et al. (2026) algebraic view of multiclass hypothesis classes, turning it into a usable combinatorial tool.
Unified treatment of multiclass and list learning: Demonstrates that the same DS‑dimension‑based bound governs both settings, simplifying theory and practice.

Methodology

Background on DS dimension: The DS (Dichotomies‑Shattering) dimension generalizes VC dimension to multiclass problems. It counts how many labelings a hypothesis class can “shatter” when the label set has size > 2.
Hypergraph representation: The authors model a multiclass hypothesis class as a hypergraph where vertices are input points and hyperedges correspond to label‑assignment patterns realizable by the class.
Density analysis: They prove that the maximum edge density (the ratio of realized label patterns to all possible patterns) is bounded by the DS dimension. This involves combinatorial arguments that translate the algebraic characterization of Hanneke et al. into a density bound.
Sample‑complexity derivation: Using standard PAC‑learning arguments (uniform convergence, covering numbers) together with the new density bound, they obtain matching upper and lower sample‑complexity formulas:
```
m(\epsilon,\delta) = \Theta\!\left(\frac{\text{DS}}{\epsilon}\log\frac{1}{\epsilon} + \frac{1}{\epsilon}\log\frac{1}{\delta}\right)
```
for both multiclass and list learning.
List learning extension: They adapt the hypergraph analysis to the list‑learning scenario (where the learner may output a short list of candidate labels), showing the same DS‑dimension dependence holds.

Results & Findings

Maximum hypergraph density ≤ DS dimension. This resolves the conjecture that the combinatorial richness of a multiclass class cannot outgrow its DS dimension.
Optimal sample‑complexity formula (up to constant factors) for any multiclass hypothesis class, eliminating the previous √DS slack.
Equivalence of multiclass and list learning in terms of sample complexity: the same DS‑dimension term governs both, meaning that allowing a list of predictions does not fundamentally change the data requirement.

In short, the paper provides a clean, tight relationship: more DS dimension → more samples needed, and no hidden extra factors.

Practical Implications

Model selection: When choosing a multiclass architecture (e.g., a neural‑net head, decision tree, or custom rule‑based system), developers can now estimate the minimum amount of labeled data required simply by evaluating the DS dimension of the hypothesis class.
Data budgeting: Teams can allocate labeling resources more accurately. If a model’s DS dimension is high, the formula tells you exactly how many examples you’ll need to hit a target error ε with confidence 1‑δ.
Algorithm design: The hypergraph‑density viewpoint suggests new regularization strategies: limiting the effective DS dimension (e.g., via weight sharing, label‑embedding constraints) directly reduces sample complexity.
List‑learning applications: In settings like recommendation or retrieval where returning a short list of candidates is acceptable, developers can adopt list‑learning algorithms without fearing extra data costs—thanks to the same DS‑dimension bound.
Benchmarking and diagnostics: If a model underperforms despite ample data, the DS dimension can help diagnose whether the hypothesis class is too expressive (high DS) relative to the problem, prompting a simplification.

Limitations & Future Work

Computing DS dimension: While the theory is tight, estimating the DS dimension for modern deep architectures remains non‑trivial; the paper does not provide scalable algorithms for this.
Constant factors: The bounds are optimal up to constants, but practical sample sizes can be sensitive to those hidden factors, especially for very small ε.
Extension to structured outputs: The work focuses on flat multiclass and list learning; applying the same techniques to sequence labeling, parsing, or other structured prediction tasks is an open direction.
Empirical validation: The paper is primarily theoretical; future work could test the predicted sample requirements on real datasets and compare against standard deep‑learning training curves.

Overall, this breakthrough gives developers a precise, theory‑backed compass for navigating the data‑vs‑model trade‑off in multiclass learning.

Authors

Chirag Pabbaraju

Paper Information

arXiv ID: 2604.24749v1
Categories: cs.LG, stat.ML
Published: April 27, 2026
PDF: Download PDF

[Paper] The Optimal Sample Complexity of Multiclass and List Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recursive Multi-Agent Systems

[Paper] How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

[Paper] Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

[Paper] Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models