[Paper] Neural Scaling Laws for Boosted Jet Tagging
Source: arXiv - 2602.15781v1
Overview
The paper Neural Scaling Laws for Boosted Jet Tagging investigates how the performance of machine‑learning models for particle‑physics tasks improves as we throw more compute at them—mirroring the scaling trends that have driven breakthroughs in large language models. By systematically training models on the public JetClass dataset, the authors uncover predictable “scaling laws” that tell us how much accuracy we can expect when we increase model size, training data, or compute budget, and they show how these laws differ when using low‑level (raw particle) versus high‑level (engineered) features.
Key Contributions
- Derivation of compute‑optimal scaling laws for boosted‑jet classification, quantifying the relationship between FLOPs, model capacity, and test accuracy.
- Identification of an asymptotic performance ceiling that can be approached by scaling compute, providing a practical target for future HEP models.
- Analysis of data repetition effects, showing how re‑using expensive simulated events effectively increases the “usable” dataset size and alters scaling exponents.
- Feature‑level comparison, demonstrating that low‑level particle‑level inputs raise the asymptotic limit and yield better performance at any fixed compute budget compared with high‑level engineered features.
- Public release of training scripts and scaling‑law fits, enabling the community to reproduce results and apply the methodology to other HEP or scientific datasets.
Methodology
-
Dataset & Task – The authors use the JetClass benchmark, a publicly available collection of simulated particle‑collision events labeled as either “boosted W boson” or generic QCD jets.
-
Model Families – Two families are explored:
- (a) a transformer‑style architecture ingesting raw particle four‑vectors (low‑level),
- (b) a dense‑network using high‑level jet observables (e.g., mass, N‑subjettiness).
-
Scaling Experiments – For each family, they train many models while systematically varying:
- Model size (number of parameters),
- Training compute (FLOPs, approximated by epochs × batch size × model ops),
- Effective dataset size (including repetitions of the same simulated events).
-
Fit to Power‑Law Forms – Test accuracy (A) is modeled as
[ A(N, C) = A_{\infty} - \alpha N^{-\beta} - \gamma C^{-\delta}, ]
where (N) is the (effective) number of training examples, (C) the compute, and (A_{\infty}) the asymptotic limit. Non‑linear regression yields the scaling exponents (\beta, \delta) and the ceiling (A_{\infty}).
-
Cross‑validation – Results are validated on held‑out test splits and repeated with different random seeds to ensure robustness.
Results & Findings
| Aspect | What the authors observed |
|---|---|
| Compute scaling | Test accuracy improves as a power law of compute, with diminishing returns. For low‑level features, the exponent (\delta \approx 0.12); for high‑level features, (\delta \approx 0.08). |
| Dataset scaling | Accuracy also follows a power law in effective dataset size, but the exponent (\beta) is larger for low‑level inputs (≈ 0.25) than for high‑level (≈ 0.15), indicating greater data efficiency when using raw particles. |
| Asymptotic limit | The low‑level model caps at (A_{\infty} \approx 0.985) (AUC), whereas the high‑level model caps near (0.970). This 1.5 % gap persists even with infinite compute. |
| Data repetition | Re‑using simulated events (i.e., training on the same event multiple times) effectively multiplies the dataset size by a factor ≈ 1.6, shifting the scaling curve upward without changing the asymptotic limit. |
| Compute‑optimal regime | For a given compute budget, the best performance is achieved by balancing model size and number of training steps according to the derived scaling law, rather than simply “bigger is better.” |
Practical Implications
- Roadmap for HEP ML projects – Teams can estimate how much additional GPU time will actually move the needle on jet‑tagging performance, avoiding wasteful over‑training.
- Feature engineering decisions – Investing in pipelines that expose raw particle information (e.g., graph‑based or transformer models) yields higher ultimate accuracy than spending resources on handcrafted high‑level observables.
- Simulation budget planning – Since data repetition offers a predictable boost, experiments can trade a modest increase in training epochs for costly additional Monte‑Carlo generation, optimizing the overall compute‑to‑accuracy trade‑off.
- Benchmarking foundation‑model style scaling – The derived scaling laws provide a baseline for future “foundation models” in HEP; developers can compare new architectures against the compute‑optimal curve to gauge novelty.
- Transfer to other domains – The methodology (fit‑to‑power‑law, compute‑optimal balancing) is directly applicable to any scientific ML problem where data generation is expensive (e.g., climate modeling, astrophysics).
Limitations & Future Work
- Simulation fidelity – The study relies on a single public dataset; real‑world detector effects and pile‑up may shift scaling exponents.
- Hardware‑specific scaling – FLOP counts abstract away memory bandwidth and parallelism constraints; scaling on specialized accelerators (TPUs, ASICs) could differ.
- Model diversity – Only transformer and dense‑net baselines were examined; convolutional, graph‑neural, or hybrid architectures might exhibit distinct scaling behavior.
- Beyond binary tagging – Extending the analysis to multi‑class or regression tasks (e.g., jet energy regression) remains an open question.
- Theoretical grounding – While empirical power laws fit well, a deeper theoretical explanation linking physics symmetries to scaling exponents would strengthen the findings.
Bottom line: By quantifying how compute, data, and feature choice interact in boosted‑jet tagging, this work gives developers a practical “scaling calculator” to plan experiments, allocate resources, and push HEP machine learning toward its next performance frontier.
Authors
- Matthias Vigl
- Nicole Hartman
- Michael Kagan
- Lukas Heinrich
Paper Information
- arXiv ID: 2602.15781v1
- Categories: hep-ex, cs.LG, hep-ph, physics.data-an
- Published: February 17, 2026
- PDF: Download PDF