[Paper] Through the telecom lens: Are all training samples important?

Published: (November 26, 2025 at 01:44 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2511.21668v1

Overview

The paper Through the telecom lens: Are all training samples important? examines a hidden assumption in most AI pipelines for telecom: that every data point in a training set contributes equally to model performance. By dissecting the influence of individual samples, the authors show how telecom operators can cut storage, compute, and energy costs while keeping accuracy intact—an important step toward more sustainable, production‑ready AI.

Key Contributions

  • Sample‑level gradient analysis across training epochs to expose which telecom records are truly driving learning.
  • Importance‑based data selection framework that automatically prioritizes high‑impact samples and discards redundant or noisy ones.
  • Empirical validation on three real‑world telecom datasets (RAN optimization, QoE prediction, and network fault detection) demonstrating up to 30 % reduction in training data and 25 % lower compute/energy with no measurable loss in accuracy.
  • Open‑source tooling for gradient‑based importance scoring, ready to plug into existing PyTorch/TensorFlow pipelines.

Methodology

  1. Gradient‑based Influence Scoring – For each training sample, the authors compute the norm of its gradient contribution to the loss at every epoch. Larger norms indicate a stronger “push” on model parameters.
  2. Temporal Pattern Mining – By tracking these scores over time, they identify three archetypes:
    • Consistently influential (core learning signals)
    • Transiently influential (useful early, then redundant)
    • Never influential (noisy or mislabeled).
  3. Dynamic Sub‑sampling – Using a simple threshold on the aggregated scores, the training set is pruned before each epoch, keeping only the top‑k% most influential samples.
  4. Sustainability Metrics – They measure FLOPs, GPU power draw, and wall‑clock time to quantify the computational savings.

The pipeline is lightweight (gradient norms are already computed during back‑prop) and can be toggled on/off without redesigning the model architecture.

Results & Findings

DatasetBaseline AccuracyPruned AccuracyData ReductionCompute/Energy Savings
RAN KPI prediction92.1 %91.9 %28 %24 %
QoE rating88.4 %88.2 %32 %27 %
Fault detection95.6 %95.5 %30 %25 %
  • Performance parity: Accuracy drops are <0.3 % across the board.
  • Training speedup: Epoch times shrink by roughly a quarter, directly translating to lower electricity bills and faster model iteration cycles.
  • Robustness to noise: The framework automatically filters mislabeled or outlier records, improving model stability on noisy telecom logs.

Practical Implications

  • Cost‑effective model updates: Operators can retrain models more frequently (e.g., nightly) without exploding compute budgets, enabling near‑real‑time adaptation to network changes.
  • Edge deployment: Smaller training footprints mean lighter models can be fine‑tuned on edge servers or even on‑device, opening doors for localized AI (e.g., on 5G base stations).
  • Sustainable AI compliance: Reducing FLOPs aligns with emerging ESG (Environmental, Social, Governance) reporting standards for telecom firms.
  • Simplified data pipelines: By automatically flagging low‑impact samples, data engineers spend less time on manual cleaning and can focus on gathering truly novel measurements (new antenna types, spectrum bands, etc.).

Limitations & Future Work

  • Threshold sensitivity: The current heuristic for selecting the top‑k% may need tuning per dataset; an adaptive, learning‑based threshold could be more robust.
  • Model‑agnosticity: Experiments were limited to feed‑forward and LSTM architectures; extending the analysis to transformer‑based telecom models (e.g., for traffic forecasting) remains open.
  • Real‑time streaming: The study assumes a static training set; integrating the importance scoring into continuous learning pipelines (online updates) is a promising next step.

Overall, the paper provides a practical, low‑overhead recipe for telecom AI teams to make their models leaner, greener, and faster—without sacrificing the performance that modern networks demand.

Authors

  • Shruti Bothe
  • Illyyne Saffar
  • Aurelie Boisbunon
  • Hasan Farooq
  • Julien Forgeat
  • Md Moin Uddin Chowdhury

Paper Information

  • arXiv ID: 2511.21668v1
  • Categories: cs.LG, cs.AI
  • Published: November 26, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »