[Paper] Neural Scaling Laws for Boosted Jet Tagging

Published: 2 months ago (February 17, 2026 at 01:13 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.15781v1

Overview

The paper Neural Scaling Laws for Boosted Jet Tagging investigates how the performance of machine‑learning models for particle‑physics tasks improves as we throw more compute at them—mirroring the scaling trends that have driven breakthroughs in large language models. By systematically training models on the public JetClass dataset, the authors uncover predictable “scaling laws” that tell us how much accuracy we can expect when we increase model size, training data, or compute budget, and they show how these laws differ when using low‑level (raw particle) versus high‑level (engineered) features.

Key Contributions

Derivation of compute‑optimal scaling laws for boosted‑jet classification, quantifying the relationship between FLOPs, model capacity, and test accuracy.
Identification of an asymptotic performance ceiling that can be approached by scaling compute, providing a practical target for future HEP models.
Analysis of data repetition effects, showing how re‑using expensive simulated events effectively increases the “usable” dataset size and alters scaling exponents.
Feature‑level comparison, demonstrating that low‑level particle‑level inputs raise the asymptotic limit and yield better performance at any fixed compute budget compared with high‑level engineered features.
Public release of training scripts and scaling‑law fits, enabling the community to reproduce results and apply the methodology to other HEP or scientific datasets.

Methodology

Dataset & Task – The authors use the JetClass benchmark, a publicly available collection of simulated particle‑collision events labeled as either “boosted W boson” or generic QCD jets.
Model Families – Two families are explored:
- (a) a transformer‑style architecture ingesting raw particle four‑vectors (low‑level),
- (b) a dense‑network using high‑level jet observables (e.g., mass, N‑subjettiness).
Scaling Experiments – For each family, they train many models while systematically varying:
- Model size (number of parameters),
- Training compute (FLOPs, approximated by epochs × batch size × model ops),
- Effective dataset size (including repetitions of the same simulated events).
Fit to Power‑Law Forms – Test accuracy (A) is modeled as
[ A(N, C) = A_{\infty} - \alpha N^{-\beta} - \gamma C^{-\delta}, ]
where (N) is the (effective) number of training examples, (C) the compute, and (A_{\infty}) the asymptotic limit. Non‑linear regression yields the scaling exponents (\beta, \delta) and the ceiling (A_{\infty}).
Cross‑validation – Results are validated on held‑out test splits and repeated with different random seeds to ensure robustness.

Results & Findings

Aspect	What the authors observed
Compute scaling	Test accuracy improves as a power law of compute, with diminishing returns. For low‑level features, the exponent (\delta \approx 0.12); for high‑level features, (\delta \approx 0.08).
Dataset scaling	Accuracy also follows a power law in effective dataset size, but the exponent (\beta) is larger for low‑level inputs (≈ 0.25) than for high‑level (≈ 0.15), indicating greater data efficiency when using raw particles.
Asymptotic limit	The low‑level model caps at (A_{\infty} \approx 0.985) (AUC), whereas the high‑level model caps near (0.970). This 1.5 % gap persists even with infinite compute.
Data repetition	Re‑using simulated events (i.e., training on the same event multiple times) effectively multiplies the dataset size by a factor ≈ 1.6, shifting the scaling curve upward without changing the asymptotic limit.
Compute‑optimal regime	For a given compute budget, the best performance is achieved by balancing model size and number of training steps according to the derived scaling law, rather than simply “bigger is better.”

Practical Implications

Roadmap for HEP ML projects – Teams can estimate how much additional GPU time will actually move the needle on jet‑tagging performance, avoiding wasteful over‑training.
Feature engineering decisions – Investing in pipelines that expose raw particle information (e.g., graph‑based or transformer models) yields higher ultimate accuracy than spending resources on handcrafted high‑level observables.
Simulation budget planning – Since data repetition offers a predictable boost, experiments can trade a modest increase in training epochs for costly additional Monte‑Carlo generation, optimizing the overall compute‑to‑accuracy trade‑off.
Benchmarking foundation‑model style scaling – The derived scaling laws provide a baseline for future “foundation models” in HEP; developers can compare new architectures against the compute‑optimal curve to gauge novelty.
Transfer to other domains – The methodology (fit‑to‑power‑law, compute‑optimal balancing) is directly applicable to any scientific ML problem where data generation is expensive (e.g., climate modeling, astrophysics).

Limitations & Future Work

Simulation fidelity – The study relies on a single public dataset; real‑world detector effects and pile‑up may shift scaling exponents.
Hardware‑specific scaling – FLOP counts abstract away memory bandwidth and parallelism constraints; scaling on specialized accelerators (TPUs, ASICs) could differ.
Model diversity – Only transformer and dense‑net baselines were examined; convolutional, graph‑neural, or hybrid architectures might exhibit distinct scaling behavior.
Beyond binary tagging – Extending the analysis to multi‑class or regression tasks (e.g., jet energy regression) remains an open question.
Theoretical grounding – While empirical power laws fit well, a deeper theoretical explanation linking physics symmetries to scaling exponents would strengthen the findings.

Bottom line: By quantifying how compute, data, and feature choice interact in boosted‑jet tagging, this work gives developers a practical “scaling calculator” to plan experiments, allocate resources, and push HEP machine learning toward its next performance frontier.

Authors

Matthias Vigl
Nicole Hartman
Michael Kagan
Lukas Heinrich

Paper Information

arXiv ID: 2602.15781v1
Categories: hep-ex, cs.LG, hep-ph, physics.data-an
Published: February 17, 2026
PDF: Download PDF

[Paper] Neural Scaling Laws for Boosted Jet Tagging

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

[Paper] Unifying approach to uniform expressivity of graph neural networks

[Paper] Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges