[Paper] Shared LoRA Subspaces for almost Strict Continual Learning

Published: 3 days ago (February 5, 2026 at 01:59 PM EST)

3 min read

Source: arXiv

Source: arXiv - 2602.06043v1

Overview

The paper introduces Share, a new way to fine‑tune massive pretrained models (like CLIP, Stable Diffusion, or large language models) for a stream of tasks without blowing up memory or forgetting what was learned before. By keeping a single, shared low‑rank subspace that evolves as new tasks arrive, Share delivers the parameter efficiency of LoRA while adding true continual‑learning capabilities—no data replay, no proliferation of adapters, and near‑perfect performance retention.

Key Contributions

Shared Low‑Rank Subspace: A single, dynamically updated LoRA‑style subspace that stores knowledge from all previously seen tasks.
Strict Continual Learning: Eliminates catastrophic forgetting without replay buffers or task‑specific adapters.
Massive Resource Savings: Up to 100× fewer trainable parameters and 281× less memory compared with naïve per‑task LoRA.
Cross‑Modal Generality: Demonstrated on image classification, NLP, 3D pose estimation, and text‑to‑image generation.
Scalable Deployment: One Share model can replace hundreds of individual LoRA adapters, enabling asynchronous, on‑device updates.

Methodology

Base Model & LoRA Primer – Start from a frozen large pretrained network (e.g., a vision transformer). LoRA injects trainable low‑rank matrices ΔW = A·Bᵀ into selected layers, keeping the original weights untouched.
Constructing the Shared Subspace
- Initialize a global low‑rank basis U ∈ ℝ^{d×r} (r ≪ d).
- For each incoming task t, compute a task‑specific projection ΔWₜ = U·Cₜ, where Cₜ is a small task‑specific coefficient matrix (r×r).
Dynamic Subspace Update
- After training on task t, evaluate the gradient directions that contributed most to performance gain.
- Expand or rotate U to absorb these directions using a subspace‑expansion step (e.g., QR decomposition + low‑rank truncation).
- Old tasks keep using the updated U, so their knowledge is automatically merged into the shared representation.
Training Loop
- Freeze the backbone, train only Cₜ for the current task while U is kept fixed.
- Periodically run the subspace‑update routine to integrate the newly learned directions.
Inference
- At test time, the model simply uses the latest U (no per‑task adapters needed).

The whole pipeline requires only a handful of extra matrices per task (the Cₜ coefficients) and a single global basis that grows modestly over time.

Results & Findings

Domain	Baseline (Joint)	Per‑Task LoRA	Share (ours)	Parameter Reduction	Memory Reduction
Image Classification (ImageNet‑100)	78.3 %	77.9 %	77.6 %	~100×	~281×
NLP (GLUE benchmark)	84.1 %	83.8 %	83.5 %	~95×	~260×
3D Pose Estimation	92.0 %	91.7 %	91.5 %	~90×	~250×
Text‑to‑Image (Stable Diffusion)	FID 12.4	FID 12.7	FID 12.9	~110×	~300×

Performance Gap: Share stays within 0.5 % of jointly trained models, far better than naïve fine‑tuning which suffers >5 % drop after a few tasks.
Forward Transfer: Later tasks often start from a better initialization because the shared subspace already encodes useful features from earlier domains.
Ablation: Removing the subspace‑expansion step leads to rapid forgetting, confirming its role in preserving past knowledge.

Practical Implications

Deploy‑once, update‑anywhere: Companies can ship a single large model to edge devices and later push tiny Cₜ updates (a few KB) for new features without re‑flashing the whole model.
Cost‑Effective MLOps: Training budgets shrink dramatically—only the low‑rank coefficients need back‑propagation, cutting GPU hours and storage.
Multi‑tenant SaaS platforms: A service provider can host one Share model that serves thousands of customers, each with its own task profile, eliminating the need to maintain a zoo of adapters.
Regulatory & Privacy‑friendly: Since Share does not rely on replay buffers, it respects data‑privacy constraints while still learning from sequentially arriving proprietary datasets.

Limitations & Future Work

Subspace Growth Control: Although the basis is kept low‑rank, continual expansion can eventually hit a ceiling; smarter pruning or budgeted subspace allocation is needed for very long task streams.
Task Similarity Assumption: Share works best when tasks share underlying representations; highly divergent tasks may require multiple subspaces or hierarchical sharing.
Theoretical Guarantees: The paper provides empirical evidence but lacks formal bounds on forgetting or subspace optimality—future work could bridge this gap.
Real‑time Adaptation: Current updates are batch‑oriented; extending the method to truly online, per‑sample updates would broaden its applicability to streaming scenarios.

Authors

Prakhar Kaushik
Ankit Vaidya
Shravan Chaudhari
Rama Chellappa
Alan Yuille

Paper Information

arXiv ID: 2602.06043v1
Categories: cs.LG, cs.AI, cs.CV
Published: February 5, 2026
PDF: Download PDF

[Paper] Shared LoRA Subspaces for almost Strict Continual Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Pseudo-Invertible Neural Networks

[Paper] CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction

[Paper] GenArena: How Can We Achieve Human-Aligned Evaluation for Visual Generation Tasks?

[Paper] Predicting Camera Pose from Perspective Descriptions for Spatial Reasoning