[Paper] Catastrophic Forgetting Resilient One-Shot Incremental Federated Learning

Published: 2 months ago (February 19, 2026 at 01:44 PM EST)

5 min read

Source: arXiv

Source: arXiv

Overview

The paper introduces One‑Shot Incremental Federated Learning (OSI‑FL), a novel framework that lets a federation of edge devices train a shared model in a single communication round while still handling new data that arrives over time. By leveraging frozen vision‑language embeddings and a server‑side diffusion model, OSI‑FL dramatically cuts communication costs and mitigates the dreaded “catastrophic forgetting” problem that plagues incremental learning.

Key Contributions

One‑shot communication – Clients transmit only compact, category‑specific embeddings (instead of raw data or full model updates), allowing the entire federation to converge in a single round.
Synthetic data generation – A pre‑trained diffusion model on the server expands those embeddings into realistic images that approximate each client’s data distribution.
Selective Sample Retention (SSR) – An on‑the‑fly sampling strategy keeps the p most informative synthetic samples per class‑task pair, providing a lightweight replay buffer that curbs forgetting when new tasks arrive.
Unified incremental setting – OSI‑FL works for both class‑incremental (new categories appear) and domain‑incremental (same categories but new visual styles) scenarios.
Empirical superiority – Across three standard vision benchmarks, OSI‑FL outperforms traditional multi‑round FL and existing one‑shot FL baselines in both accuracy and forgetting metrics.

Methodology

Client‑side – frozen VLM embeddings
- Each participant runs a pre‑trained vision‑language model (e.g., CLIP) in inference mode.
- For every class present locally, the client extracts a category embedding (a high‑dimensional vector) and sends it to the server.
- No gradients, raw images, or model parameters leave the device.
Server‑side – diffusion‑based data synthesis
- The server hosts a pre‑trained diffusion model (e.g., Stable Diffusion).
- Using the received embeddings as conditioning signals, the diffusion model generates synthetic images that approximate the client’s data distribution for each class.
Training the global model
- The server aggregates all synthetic samples and trains a global vision model (e.g., ResNet) in the usual supervised manner.
- When a new task (new classes or new domains) arrives, the server repeats steps 1–2, adding fresh synthetic data to the training pool.
Selective Sample Retention (SSR)
- After each training epoch, the server computes the loss for every synthetic sample.
- For each (class, task) pair, it retains the top‑p samples with the highest loss (i.e., the most “hard” or informative examples).
- These retained samples are re‑used in subsequent training cycles, acting as a compact replay buffer that preserves knowledge of earlier tasks without storing the full dataset.

Key advantage: The whole pipeline requires only one communication round per incremental update, making it suitable for bandwidth‑constrained or privacy‑sensitive environments.

Results & Findings

Dataset	Setting	Baseline (multi‑round FL)	OSI‑FL (Ours)	Forgetting ↓
CIFAR‑100	Class‑incremental (10 tasks)	68.2 %	77.5 %	12 %
ImageNet‑R	Domain‑incremental (5 domains)	61.4 %	70.1 %	9 %
Tiny‑ImageNet	Mixed (new classes + domains)	63.7 %	71.8 %	11 %

Accuracy gains: 7–10 % over the strongest baselines.
Forgetting: reduced by roughly half (drop in performance on earlier tasks).
Communication overhead: decreased from hundreds of megabytes (full model/gradient exchange) to a few kilobytes (embedding vectors).

Ablation study highlights

SSR alone contributes ~3 % accuracy improvement.
Diffusion‑generated data accounts for the bulk of the gain.

Practical Implications

Edge‑AI deployments – Smart cameras, mobile phones, or IoT sensors can take part in federated training without ever transmitting raw images, preserving user privacy and complying with data‑locality regulations.
Rapid model updates – New product lines or visual styles (e.g., seasonal UI themes) can be incorporated with a single round of communication, dramatically reducing OTA‑update latency.
Cost‑effective scaling – Service providers can support thousands of clients on low‑bandwidth links (cellular, satellite) because the payload consists of only a handful of embedding vectors per class.
Replay‑free continual learning – SSR provides a lightweight alternative to large replay buffers, which is attractive for on‑device continual‑learning pipelines where storage is at a premium.

Tip for developers:
Integrate OSI‑FL by plugging in any off‑the‑shelf vision‑language model (e.g., CLIP, BLIP) and diffusion model (e.g., Stable Diffusion), then use the supplied SSR module to manage the synthetic replay set.

Limitations & Future Work

Synthetic fidelity – The quality of generated data depends on the diffusion model’s ability to capture the client’s distribution; rare or highly domain‑specific visual features may be under‑represented.
Embedding privacy – Although embeddings are far less sensitive than raw images, they can still leak information (e.g., via inversion attacks). Formal privacy guarantees such as differential‑private federated learning (DP‑FL) were not explored.
Scalability of SSR – The retention factor p must be tuned; a too‑small buffer may miss critical variations, while a too‑large buffer erodes the “one‑shot” communication advantage.
Non‑vision modalities – The current design assumes visual data; extending OSI‑FL to text, audio, or multimodal streams remains an open challenge.

Future Research Directions

Integrate differential privacy into the embedding transmission pipeline.
Develop adaptive strategies for selecting the retention factor p in SSR.
Evaluate OSI‑FL on real‑world federated deployments (e.g., autonomous‑vehicle fleets).
Explore extensions to non‑visual modalities such as text, audio, and multimodal data streams.

Authors

Obaidullah Zaland
Zulfiqar Ahmad Khan
Monowar Bhuyan

Paper Information

Field	Details
arXiv ID	`2602.17625v1`
Categories	`cs.LG, cs.DC`
Published	February 19, 2026
PDF	Download PDF

[Paper] Catastrophic Forgetting Resilient One-Shot Incremental Federated Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Future Research Directions

Authors

Paper Information

Related posts

[Paper] Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

[Paper] Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

[Paper] FedZMG: Efficient Client-Side Optimization in Federated Learning

[Paper] Guarding the Middle: Protecting Intermediate Representations in Federated Split Learning