[Paper] Task-Agnostic Continual Learning for Chest Radiograph Classification
Source: arXiv - 2602.15811v1
Overview
The paper introduces CARL‑XRay, a continual‑learning framework that lets chest‑radiograph classifiers evolve as new datasets arrive—without needing to retrain on all previously seen images or keep raw scans in storage. By treating each incoming dataset as a separate “task” and automatically routing inputs to the right task‑specific adapters, the method promises stable diagnostic performance while dramatically cutting down on training overhead.
Key Contributions
- Task‑agnostic continual learning for medical imaging: first work to handle sequential, heterogeneous chest‑X‑ray datasets where the task label is unknown at inference time.
- Adapter‑based routing architecture (CARL‑XRay): a frozen high‑capacity backbone plus lightweight, per‑task adapters and classifier heads that are added on‑the‑fly.
- Latent task selector that uses compact prototypes and feature‑level experience replay to identify the correct task without storing raw images.
- Parameter‑efficient updates: only a few hundred kilobytes of new parameters are learned per dataset, compared to millions required for full model retraining.
- Empirical validation on several large public chest‑X‑ray collections, showing superior routing accuracy (75 % vs. 62.5 %) and AUROC comparable to joint training (≈0.75).
Methodology
- Fixed backbone – A large convolutional (or transformer‑based) encoder is pretrained once on a generic chest‑X‑ray corpus and then frozen.
- Task‑specific adapters – For each new dataset, a small bottleneck module (adapter) and a lightweight classifier head are appended to the backbone. These adapters learn the domain‑specific nuances (e.g., different hospital protocols, label sets).
- Prototype‑based task selector – The system keeps a compact set of class‑wise feature prototypes from past tasks. When a new image arrives, its backbone features are passed through every adapter, and the selector picks the task whose prototypes best match the resulting representation.
- Feature‑level experience replay – Instead of storing raw images, the method replays stored feature vectors (and their prototypes) during adapter training, preserving knowledge of earlier tasks while keeping privacy and storage costs low.
- Training loop – When a new dataset appears, only the new adapter, head, and the selector are updated; the backbone stays untouched. This “plug‑and‑play” approach enables rapid incremental updates.
Results & Findings
| Metric | Joint training (oracle) | CARL‑XRay (oracle) | CARL‑XRay (task‑unknown) |
|---|---|---|---|
| AUROC | 0.76 | 0.74 | 0.75 |
| Routing accuracy | – | 75 % | 75 % |
| Routing accuracy (baseline) | – | 62.5 % | 62.5 % |
| Additional trainable params per task | ~10 M | ~0.3 M | ~0.3 M |
- Performance retention: After up to 5 sequential dataset additions, AUROC drops by less than 2 % compared with a model trained jointly on all data.
- Task identification: The selector reliably distinguishes tasks even when the same disease labels appear across datasets with different visual distributions.
- Memory footprint: Only the adapters, heads, and prototype buffers are stored, eliminating the need for raw‑image archives.
Practical Implications
- Continuous deployment in hospitals: Radiology AI systems can be updated with new local data (e.g., a new scanner vendor or a regional disease outbreak) without a costly full‑retraining pipeline.
- Regulatory friendliness: Since the backbone remains unchanged, the core “validated” model stays the same, simplifying compliance audits; only small, auditable adapters need version control.
- Edge‑friendly updates: The tiny adapter modules can be shipped over the air to on‑premise servers or even to edge devices, enabling rapid model refreshes.
- Data privacy: By never persisting raw images—only anonymized feature prototypes—organizations can stay within HIPAA/GDPR constraints while still benefiting from continual learning.
- Developer workflow: Integration is as simple as loading the frozen backbone, attaching the new adapter, and invoking the selector; no custom data pipelines or massive GPU clusters are required.
Limitations & Future Work
- Task similarity assumption: The selector relies on distinguishable feature prototypes; highly overlapping datasets may cause routing confusion.
- Prototype storage growth: Although much smaller than raw images, the prototype buffer still grows linearly with the number of tasks; smarter summarization or pruning strategies are needed.
- Evaluation scope: Experiments focus on public chest‑X‑ray datasets; real‑world clinical settings with label drift, multi‑modal inputs, or extreme class imbalance remain to be tested.
- Extension to other modalities: Future work could explore whether the adapter‑routing paradigm transfers to CT, MRI, or non‑imaging time‑series data.
Authors
- Muthu Subash Kavitha
- Anas Zafar
- Amgad Muneer
- Jia Wu
Paper Information
- arXiv ID: 2602.15811v1
- Categories: cs.CV, cs.AI
- Published: February 17, 2026
- PDF: Download PDF