[Paper] EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor?

Published: (November 26, 2025 at 10:52 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.21523v1

Overview

The paper introduces EoS‑FM, an “Ensemble‑of‑Specialists” framework that builds a remote‑sensing foundation model from a collection of lightweight, task‑specific ConvNeXtV2 networks. Instead of training a single gigantic model on massive Earth‑observation datasets, the authors show how to stitch together many small specialists that can be frozen, shared, and recombined—offering a more sustainable, modular, and collaborative path to generalist feature extraction in satellite imagery.

Key Contributions

  • Ensemble‑of‑Specialists (EoS) paradigm: Proposes a modular architecture where each specialist is trained on a single downstream task (e.g., land‑cover classification, cloud detection) and later combined to act as a universal feature extractor.
  • Efficient training pipeline: Uses relatively small ConvNeXtV2 backbones, drastically reducing GPU hours and memory compared with monolithic foundation models.
  • Frozen‑model reuse: Once a specialist is trained, its weights are frozen, enabling instant reuse without retraining or fine‑tuning.
  • Federated & incremental learning support: The design naturally accommodates federated training across institutions and continuous integration of new specialists without disrupting the existing ensemble.
  • Interpretability & extensibility: Because each specialist is task‑focused, its contribution to the final representation can be inspected, making debugging and model auditing easier.

Methodology

  1. Task‑specific specialist training – For each remote‑sensing task, a ConvNeXtV2 model is trained on the corresponding labeled dataset. The training follows standard supervised pipelines (cross‑entropy or regression loss) but stops once the specialist reaches a satisfactory performance threshold.
  2. Freezing & cataloguing – After training, the specialist’s parameters are frozen and stored in a model registry. No further gradient updates are performed on these models.
  3. Ensembling as a feature extractor – At inference time, an input satellite image is passed through all frozen specialists in parallel. Their intermediate feature maps (e.g., the output of the penultimate block) are concatenated or summed to produce a unified representation. This representation can be fed to downstream lightweight heads for new tasks.
  4. Federated aggregation (optional) – Institutions can train specialists locally on proprietary data, then upload only the frozen weights to a shared repository. The central ensemble simply aggregates the new specialists without needing raw data exchange.
  5. Pruning & continuous integration – Redundant specialists can be pruned based on contribution metrics (e.g., mutual information with the final representation). New specialists can be added on the fly, allowing the ensemble to evolve as new remote‑sensing tasks emerge.

Results & Findings

ExperimentBaseline (single large model)EoS‑FM (ensemble of 8 specialists)Relative Δ
Land‑cover classification (DeepGlobe)78.3 % mIoU81.1 % mIoU+2.8 %
Cloud detection (Landsat‑8)94.5 % F195.2 % F1+0.7 %
Multi‑task transfer (new flood‑mapping task)71.0 % IoU (fine‑tuned)73.4 % IoU (zero‑shot)+2.4 %
Training compute (GPU‑hours)~12 k h~2.5 k h–79 %
Carbon footprint (CO₂e)~1.8 t~0.4 t–78 %

Key takeaways

  • The ensemble matches or exceeds the accuracy of a monolithic foundation model on several benchmark tasks, despite using far fewer parameters per specialist.
  • Zero‑shot transfer to an unseen task (flood mapping) works out‑of‑the‑box, demonstrating genuine generalist capability.
  • Training cost and associated emissions drop dramatically, validating the sustainability claim.

Practical Implications

  • Rapid prototyping – Developers can pull a pre‑trained specialist for a known task (e.g., vegetation index prediction) and instantly combine it with other specialists to tackle a new problem without any fine‑tuning.
  • Collaborative ecosystems – Satellite agencies, NGOs, and private firms can contribute specialists trained on proprietary data while keeping the raw imagery private, fostering a shared “model marketplace.”
  • Edge deployment – Because each specialist is lightweight, the ensemble can be split across multiple edge devices (e.g., on‑board satellite processors) and aggregated later, enabling on‑the‑fly feature extraction with limited bandwidth.
  • Model governance – Auditors can trace which specialist contributed to a particular decision, simplifying compliance with emerging AI transparency regulations in geospatial analytics.
  • Cost‑effective scaling – Organizations can grow their foundation model by simply adding new specialists as new labeled datasets become available, sidestepping the need for massive compute clusters.

Limitations & Future Work

  • Ensemble size vs. latency – Running many specialists in parallel can increase inference latency, especially on CPU‑only hardware; the authors suggest model pruning and dynamic specialist selection as mitigation strategies.
  • Task overlap – Redundant knowledge across specialists may lead to diminishing returns; future research could explore more sophisticated feature‑fusion mechanisms (e.g., attention‑based weighting).
  • Benchmark breadth – Experiments focus on a limited set of remote‑sensing tasks; broader validation (e.g., SAR, hyperspectral) would strengthen claims of universality.
  • Federated security – While the framework supports federated training, robust privacy‑preserving protocols (e.g., secure aggregation) are left for subsequent work.

Overall, EoS‑FM offers a compelling, greener alternative to the “bigger‑is‑better” trend in Earth observation AI.

Back to Blog

Related posts

Read more »