[Paper] Through the telecom lens: Are all training samples important?

Published: 1 month ago (November 26, 2025 at 01:44 PM EST)

3 min read

Source: arXiv

Source: arXiv - 2511.21668v1

Overview

The paper Through the telecom lens: Are all training samples important? examines a hidden assumption in most AI pipelines for telecom: that every data point in a training set contributes equally to model performance. By dissecting the influence of individual samples, the authors show how telecom operators can cut storage, compute, and energy costs while keeping accuracy intact—an important step toward more sustainable, production‑ready AI.

Key Contributions

Sample‑level gradient analysis across training epochs to expose which telecom records are truly driving learning.
Importance‑based data selection framework that automatically prioritizes high‑impact samples and discards redundant or noisy ones.
Empirical validation on three real‑world telecom datasets (RAN optimization, QoE prediction, and network fault detection) demonstrating up to 30 % reduction in training data and 25 % lower compute/energy with no measurable loss in accuracy.
Open‑source tooling for gradient‑based importance scoring, ready to plug into existing PyTorch/TensorFlow pipelines.

Methodology

Gradient‑based Influence Scoring – For each training sample, the authors compute the norm of its gradient contribution to the loss at every epoch. Larger norms indicate a stronger “push” on model parameters.
Temporal Pattern Mining – By tracking these scores over time, they identify three archetypes:
- Consistently influential (core learning signals)
- Transiently influential (useful early, then redundant)
- Never influential (noisy or mislabeled).
Dynamic Sub‑sampling – Using a simple threshold on the aggregated scores, the training set is pruned before each epoch, keeping only the top‑k% most influential samples.
Sustainability Metrics – They measure FLOPs, GPU power draw, and wall‑clock time to quantify the computational savings.

The pipeline is lightweight (gradient norms are already computed during back‑prop) and can be toggled on/off without redesigning the model architecture.

Results & Findings

Dataset	Baseline Accuracy	Pruned Accuracy	Data Reduction	Compute/Energy Savings
RAN KPI prediction	92.1 %	91.9 %	28 %	24 %
QoE rating	88.4 %	88.2 %	32 %	27 %
Fault detection	95.6 %	95.5 %	30 %	25 %

Performance parity: Accuracy drops are <0.3 % across the board.
Training speedup: Epoch times shrink by roughly a quarter, directly translating to lower electricity bills and faster model iteration cycles.
Robustness to noise: The framework automatically filters mislabeled or outlier records, improving model stability on noisy telecom logs.

Practical Implications

Cost‑effective model updates: Operators can retrain models more frequently (e.g., nightly) without exploding compute budgets, enabling near‑real‑time adaptation to network changes.
Edge deployment: Smaller training footprints mean lighter models can be fine‑tuned on edge servers or even on‑device, opening doors for localized AI (e.g., on 5G base stations).
Sustainable AI compliance: Reducing FLOPs aligns with emerging ESG (Environmental, Social, Governance) reporting standards for telecom firms.
Simplified data pipelines: By automatically flagging low‑impact samples, data engineers spend less time on manual cleaning and can focus on gathering truly novel measurements (new antenna types, spectrum bands, etc.).

Limitations & Future Work

Threshold sensitivity: The current heuristic for selecting the top‑k% may need tuning per dataset; an adaptive, learning‑based threshold could be more robust.
Model‑agnosticity: Experiments were limited to feed‑forward and LSTM architectures; extending the analysis to transformer‑based telecom models (e.g., for traffic forecasting) remains open.
Real‑time streaming: The study assumes a static training set; integrating the importance scoring into continuous learning pipelines (online updates) is a promising next step.

Overall, the paper provides a practical, low‑overhead recipe for telecom AI teams to make their models leaner, greener, and faster—without sacrificing the performance that modern networks demand.

Authors

Shruti Bothe
Illyyne Saffar
Aurelie Boisbunon
Hasan Farooq
Julien Forgeat
Md Moin Uddin Chowdhury

Paper Information

arXiv ID: 2511.21668v1
Categories: cs.LG, cs.AI
Published: November 26, 2025
PDF: Download PDF

[Paper] Through the telecom lens: Are all training samples important?

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

[Paper] ThetaEvolve: Test-time Learning on Open Problems

[Paper] The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

[Paper] Physics-Informed Neural Networks for Thermophysical Property Retrieval