[Paper] Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning
Source: arXiv - 2604.21349v1
Overview
Self‑supervised learning (SSL) has become the go‑to way to pre‑train vision models on massive aerial image collections, but most SSL methods assume that the augmentations they apply preserve the underlying scene semantics. In real‑world remote‑sensing data, atmospheric effects (haze, rain), motion blur, occlusions, and other degradations can wipe out crucial visual cues, making the usual “make two views look the same” objective harmful. The paper “Trust‑SSL: Additive‑Residual Selective Invariance for Robust Aerial Self‑Supervised Learning” proposes a new training recipe that lets the model trust only the clean parts of a corrupted view, dramatically improving robustness to these harsh conditions.
Key Contributions
- Trust‑weighted alignment: Introduces a per‑sample, per‑corruption “trust weight” that modulates the contrastive alignment loss, allowing the network to ignore unreliable regions.
- Additive‑residual formulation: Instead of gating the loss multiplicatively, the trust weight is added as a residual term, which the authors show preserves backbone quality while still providing robustness.
- Stop‑gradient on trust: The trust weight is detached from gradient flow, preventing it from hijacking the representation learning dynamics.
- Empirical superiority: Across six backbone architectures, Trust‑SSL achieves the highest linear‑probe accuracy on three major aerial benchmarks (EuroSAT, AID, NWPU‑RESISC45), e.g., 90.20 % vs. 88.46 % for SimCLR.
- Corruption‑specific gains: Shows up to +19.9 % accuracy improvement on heavily hazed EuroSAT images (severity = 5) compared to vanilla SimCLR.
- Zero‑shot cross‑domain stress test: Improves Mahalanobis AUROC by 1–3 % on BDD100K weather splits, indicating better uncertainty awareness.
- Evidential extension: Provides a Dempster‑Shafer‑based variant that outputs interpretable “conflict” and “ignorance” scores for each prediction.
- Open‑source release: Full code and pretrained models are available on GitHub.
Methodology
-
Base SSL framework – The authors start from a standard contrastive (SimCLR‑style) or variance‑based (VICReg‑style) SSL objective that pulls together two augmented views of the same image.
-
Corruption‑aware view generation – For each image, a clean view and a corrupted view (e.g., haze, motion blur, rain) are produced.
-
Trust weight computation – A lightweight head predicts a scalar trust value τ ∈ [0, 1] for the corrupted view, estimating how much semantic information remains. This prediction is detached (stop‑gradient) so it does not receive gradient updates.
-
Additive‑residual loss
$$\mathcal{L} = \mathcal{L}{\text{base}} + (1 - \tau),\mathcal{L}{\text{residual}}$$
- 𝓛_base is the usual contrastive alignment between clean and corrupted embeddings.
- 𝓛_residual is an additional term that encourages the clean view to stay close to its own representation, acting as a safety net when τ is low.
-
Training protocol – Models are trained for 200 epochs on a 210 k‑image aerial corpus, using standard augmentations plus the corruption pipeline.
-
Evidential variant – Instead of a single τ, the model predicts a Dirichlet distribution over trust, enabling Dempster‑Shafer fusion to separate conflict (disagreement) from ignorance (lack of evidence).
Results & Findings
| Dataset | Backbone | SimCLR | VICReg | Trust‑SSL |
|---|---|---|---|---|
| EuroSAT | ResNet‑50 | 88.46 % | 89.82 % | 90.20 % |
| AID | ViT‑B/16 | 84.3 % | 85.7 % | 86.5 % |
| NWPU‑RESISC45 | Swin‑T | 81.9 % | 83.2 % | 84.1 % |
- Severe haze (s = 5) on EuroSAT: Trust‑SSL outperforms SimCLR by +19.9 % accuracy.
- Mahalanobis AUROC on BDD100K weather splits (zero‑shot): +1–3 % over baselines, indicating better detection of out‑of‑distribution weather conditions.
- Ablation studies: Replacing the additive‑residual term with a multiplicative gate degrades performance, confirming the importance of the residual design.
- Evidential scores: The Dempster‑Shafer version provides per‑sample uncertainty metrics that correlate with actual corruption severity, useful for downstream risk‑aware pipelines.
Practical Implications
- More reliable pre‑training for remote‑sensing pipelines – Satellite‑image classification, change detection, and object detection models can start from Trust‑SSL checkpoints that are less fooled by haze, rain, or motion blur.
- Uncertainty‑aware inference – The evidential variant supplies explicit “confidence” signals, enabling systems to flag low‑trust predictions for human review or to trigger alternative processing (e.g., request higher‑resolution data).
- Cost‑effective data collection – Operators can safely use cheaper, lower‑quality imagery (e.g., from small UAVs or low‑cost satellites) because the SSL backbone already knows how to discount corrupted information.
- Cross‑domain robustness – The improved Mahalanobis AUROC suggests that models trained with Trust‑SSL are better at detecting domain shifts, a common scenario when deploying a model trained on one geographic region to another.
- Plug‑and‑play upgrade – Since Trust‑SSL is built on top of existing SSL frameworks, teams can integrate the trust‑weight head and residual loss into their current training scripts with minimal code changes.
Limitations & Future Work
- Trust predictor simplicity – The current per‑sample scalar trust is learned without explicit supervision; more sophisticated, spatially varying trust maps could capture localized corruption better.
- Corruption taxonomy – Experiments focus on a handful of synthetic degradations (haze, rain, blur). Real‑world atmospheric effects can be more complex; extending the method to handle mixed or unknown corruptions is an open challenge.
- Scalability to massive backbones – While the paper evaluates six backbones, scaling to the largest vision transformers (e.g., ViT‑L/14) and multi‑modal satellite‑radar data remains to be demonstrated.
- Downstream task evaluation – The study primarily reports linear‑probe and zero‑shot AUROC results; assessing impact on fully fine‑tuned tasks such as semantic segmentation or object detection would solidify practical benefits.
Overall, Trust‑SSL offers a concrete, easy‑to‑adopt design pattern for making self‑supervised vision models more robust to the kinds of degradations that plague aerial imagery in the field.
Authors
- Wadii Boulila
- Adel Ammar
- Bilel Benjdira
- Maha Driss
Paper Information
- arXiv ID: 2604.21349v1
- Categories: cs.CV, cs.AI, cs.LG, cs.NE
- Published: April 23, 2026
- PDF: Download PDF