[Paper] An assessment of data-centric methods for label noise identification in remote sensing data sets

Published: (March 17, 2026 at 01:40 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2603.16835v1

Overview

This paper investigates how well three data‑centric label‑noise detection methods work on remote‑sensing image datasets. By deliberately corrupting the ground‑truth labels at varying intensities (10‑70 %), the authors show that these techniques can both spot noisy annotations and boost downstream model performance, offering a practical roadmap for developers dealing with imperfect satellite or aerial imagery data.

Key Contributions

  • Systematic benchmark of three label‑noise identification algorithms on two widely used remote‑sensing datasets.
  • Comprehensive noise injection study covering symmetric, asymmetric, and class‑dependent noise types across a broad range of corruption levels.
  • Quantitative analysis of how well each method isolates noisy samples and how that filtering translates into higher classification accuracy.
  • Guidelines for selecting the most suitable method based on noise characteristics and project goals.
  • Identification of research gaps in adapting data‑centric noise‑handling to the unique challenges of remote‑sensing imagery (e.g., high intra‑class variability, multi‑spectral data).

Methodology

  1. Datasets & Baselines – The authors use two benchmark remote‑sensing collections (e.g., a land‑cover scene classification set and an aerial object detection set). A standard convolutional neural network (CNN) serves as the baseline classifier.
  2. Synthetic Label Noise – They corrupt the true labels with three noise models:
    • Symmetric: any label can flip to any other with equal probability.
    • Asymmetric: flips follow a predefined confusion matrix (e.g., “forest” ↔ “grassland”).
    • Class‑dependent: certain classes are more prone to errors.
      Noise levels are varied from 10 % up to 70 %.
  3. Data‑Centric Methods Evaluated
    • Loss‑Based Filtering (e.g., small‑loss trick): assumes clean samples have lower training loss.
    • Agreement‑Based Ensemble: trains multiple models and flags samples with low consensus.
    • Feature‑Space Outlier Detection: extracts deep features and applies clustering/outlier scoring to spot mislabeled points.
  4. Evaluation Pipeline – For each noise setting, the methods first identify a subset of suspected noisy labels. Those samples are either removed or relabeled, after which the classifier is retrained. Performance is measured by:
    • Noise‑identification accuracy (precision/recall of flagged samples).
    • Task accuracy (overall classification IoU or F1 score).

Results & Findings

  • Noise Identification – All three methods outperform random guessing, but their strengths differ:
    • Loss‑Based Filtering excels at low‑to‑moderate symmetric noise (≤30 %).
    • Agreement‑Based Ensemble is most robust to asymmetric and class‑dependent noise, maintaining >70 % precision even at 50 % corruption.
    • Feature‑Space Outlier Detection shines when the data have strong visual separability (e.g., distinct spectral signatures).
  • Impact on Model Performance – Removing the identified noisy samples yields 5‑12 % absolute gains in classification accuracy compared to training on the corrupted set, with the biggest jumps observed at higher noise levels (≥50 %).
  • Trade‑off – Aggressive filtering can discard too many clean samples, slightly hurting performance when noise is low; a calibrated threshold is essential.
  • Best‑Practice Recommendation – For most remote‑sensing pipelines, a hybrid approach (combine loss‑based and agreement‑based signals) provides the most consistent improvements across noise types.

Practical Implications

  • Data‑Cleaning Pipelines – Developers can integrate these lightweight detection modules into existing training loops to automatically prune or flag suspect annotations before model deployment.
  • Cost Savings – By pinpointing noisy labels, teams can focus human annotation effort on a small subset of problematic samples, reducing costly re‑labeling campaigns.
  • Robust Model Deployment – In operational remote‑sensing applications (e.g., disaster mapping, agricultural monitoring), the ability to maintain high accuracy despite noisy crowdsourced or legacy labels translates to more reliable decision‑support tools.
  • Tooling Compatibility – The evaluated methods rely on standard deep‑learning libraries (PyTorch/TensorFlow) and require only the model’s loss values, predictions, or feature embeddings—no specialized hardware or external datasets.

Limitations & Future Work

  • Synthetic Noise Only – The study uses artificially injected label errors; real‑world noise patterns (e.g., systematic labeling bias) may behave differently.
  • Scalability – Ensemble‑based agreement methods increase training time linearly with the number of models, which could be prohibitive for very large satellite datasets.
  • Multi‑Modal Data – The experiments focus on RGB or multispectral imagery; extending to SAR, LiDAR, or fused modalities remains an open challenge.
  • Adaptive Thresholding – Future research should explore self‑tuning mechanisms that adjust filtering aggressiveness based on observed noise levels, possibly via meta‑learning.

Bottom line: This work demonstrates that data‑centric label‑noise detection is not just an academic curiosity—it’s a practical lever for improving the reliability of remote‑sensing AI systems, and the provided guidelines give developers a clear starting point for integrating these techniques into production pipelines.

Authors

  • Felix Kröber
  • Genc Hoxha
  • Ribana Roscher

Paper Information

  • arXiv ID: 2603.16835v1
  • Categories: cs.CV
  • Published: March 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Matryoshka Gaussian Splatting

The ability to render scenes at adjustable fidelity from a single model, known as level of detail (LoD), is crucial for practical deployment of 3D Gaussian Spla...