[Paper] An assessment of data-centric methods for label noise identification in remote sensing data sets

Published: 3 days ago (March 17, 2026 at 01:40 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2603.16835v1

Overview

This paper investigates how well three data‑centric label‑noise detection methods work on remote‑sensing image datasets. By deliberately corrupting the ground‑truth labels at varying intensities (10‑70 %), the authors show that these techniques can both spot noisy annotations and boost downstream model performance, offering a practical roadmap for developers dealing with imperfect satellite or aerial imagery data.

Key Contributions

Systematic benchmark of three label‑noise identification algorithms on two widely used remote‑sensing datasets.
Comprehensive noise injection study covering symmetric, asymmetric, and class‑dependent noise types across a broad range of corruption levels.
Quantitative analysis of how well each method isolates noisy samples and how that filtering translates into higher classification accuracy.
Guidelines for selecting the most suitable method based on noise characteristics and project goals.
Identification of research gaps in adapting data‑centric noise‑handling to the unique challenges of remote‑sensing imagery (e.g., high intra‑class variability, multi‑spectral data).

Methodology

Datasets & Baselines – The authors use two benchmark remote‑sensing collections (e.g., a land‑cover scene classification set and an aerial object detection set). A standard convolutional neural network (CNN) serves as the baseline classifier.
Synthetic Label Noise – They corrupt the true labels with three noise models:
- Symmetric: any label can flip to any other with equal probability.
- Asymmetric: flips follow a predefined confusion matrix (e.g., “forest” ↔ “grassland”).
- Class‑dependent: certain classes are more prone to errors.
  Noise levels are varied from 10 % up to 70 %.
Data‑Centric Methods Evaluated –
- Loss‑Based Filtering (e.g., small‑loss trick): assumes clean samples have lower training loss.
- Agreement‑Based Ensemble: trains multiple models and flags samples with low consensus.
- Feature‑Space Outlier Detection: extracts deep features and applies clustering/outlier scoring to spot mislabeled points.
Evaluation Pipeline – For each noise setting, the methods first identify a subset of suspected noisy labels. Those samples are either removed or relabeled, after which the classifier is retrained. Performance is measured by:
- Noise‑identification accuracy (precision/recall of flagged samples).
- Task accuracy (overall classification IoU or F1 score).

Results & Findings

Noise Identification – All three methods outperform random guessing, but their strengths differ:
- Loss‑Based Filtering excels at low‑to‑moderate symmetric noise (≤30 %).
- Agreement‑Based Ensemble is most robust to asymmetric and class‑dependent noise, maintaining >70 % precision even at 50 % corruption.
- Feature‑Space Outlier Detection shines when the data have strong visual separability (e.g., distinct spectral signatures).
Impact on Model Performance – Removing the identified noisy samples yields 5‑12 % absolute gains in classification accuracy compared to training on the corrupted set, with the biggest jumps observed at higher noise levels (≥50 %).
Trade‑off – Aggressive filtering can discard too many clean samples, slightly hurting performance when noise is low; a calibrated threshold is essential.
Best‑Practice Recommendation – For most remote‑sensing pipelines, a hybrid approach (combine loss‑based and agreement‑based signals) provides the most consistent improvements across noise types.

Practical Implications

Data‑Cleaning Pipelines – Developers can integrate these lightweight detection modules into existing training loops to automatically prune or flag suspect annotations before model deployment.
Cost Savings – By pinpointing noisy labels, teams can focus human annotation effort on a small subset of problematic samples, reducing costly re‑labeling campaigns.
Robust Model Deployment – In operational remote‑sensing applications (e.g., disaster mapping, agricultural monitoring), the ability to maintain high accuracy despite noisy crowdsourced or legacy labels translates to more reliable decision‑support tools.
Tooling Compatibility – The evaluated methods rely on standard deep‑learning libraries (PyTorch/TensorFlow) and require only the model’s loss values, predictions, or feature embeddings—no specialized hardware or external datasets.

Limitations & Future Work

Synthetic Noise Only – The study uses artificially injected label errors; real‑world noise patterns (e.g., systematic labeling bias) may behave differently.
Scalability – Ensemble‑based agreement methods increase training time linearly with the number of models, which could be prohibitive for very large satellite datasets.
Multi‑Modal Data – The experiments focus on RGB or multispectral imagery; extending to SAR, LiDAR, or fused modalities remains an open challenge.
Adaptive Thresholding – Future research should explore self‑tuning mechanisms that adjust filtering aggressiveness based on observed noise levels, possibly via meta‑learning.

Bottom line: This work demonstrates that data‑centric label‑noise detection is not just an academic curiosity—it’s a practical lever for improving the reliability of remote‑sensing AI systems, and the provided guidelines give developers a clear starting point for integrating these techniques into production pipelines.

Authors

Felix Kröber
Genc Hoxha
Ribana Roscher

Paper Information

arXiv ID: 2603.16835v1
Categories: cs.CV
Published: March 17, 2026
PDF: Download PDF

[Paper] An assessment of data-centric methods for label noise identification in remote sensing data sets

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

[Paper] Matryoshka Gaussian Splatting

[Paper] Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

[Paper] MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction