[Paper] PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning

Published: (February 3, 2026 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.03846v1

Overview

The paper introduces PLATE (Plasticity‑Tunable Efficient Adapters), a new continual‑learning technique for large pretrained models that doesn’t need any data from previous tasks. By exploiting the “geometric redundancy” that naturally occurs in deep networks, PLATE lets developers adapt foundation models to new domains while keeping the original knowledge intact—a common pain point when the original pre‑training data is proprietary or simply unavailable.

Key Contributions

  • Geometry‑aware plasticity control – Shows how redundant neurons encode dominant pre‑training feature directions and can be used to define safe update subspaces.
  • Low‑rank adapter design – Proposes a structured update ΔW = B A Qᵀ where B and Q are frozen, pre‑computed matrices and only the small matrix A is learned per new task.
  • No replay required – Achieves strong continual‑learning performance without storing or revisiting any old‑task data.
  • Explicit trade‑off knob – Provides a tunable parameter that lets practitioners balance plasticity (learning speed) against retention (forgetting) on a per‑layer basis.
  • Open‑source implementation – Full code released (GitHub), ready to plug into PyTorch / Hugging Face pipelines.

Methodology

  1. Detecting Redundancy – The authors analyze pretrained weights to identify groups of neurons that behave similarly (i.e., lie in low‑dimensional subspaces). These groups act as proxies for the dominant directions learned during pre‑training.
  2. Constructing Protected Subspaces – Using the identified redundancy, two orthogonal bases B and Q are built once from the original weights. B spans the “stable” directions we want to preserve, while Q captures the complementary space where we allow change.
  3. Low‑Rank Adapter Parameterization – For each layer, the weight update is expressed as ΔW = B A Qᵀ. Because B and Q are frozen, learning reduces to optimizing the much smaller matrix A. This drastically cuts the number of trainable parameters and confines updates to a controllable subspace.
  4. Plasticity‑Retention Trade‑off – By adjusting the rank of A (or selectively freezing parts of B/Q), developers can dial how much of the model is allowed to adapt versus stay fixed, giving fine‑grained control over forgetting.
  5. Training Loop – The adapter A is trained on the new task using standard gradient descent, while the rest of the network remains untouched, eliminating the need for replay buffers or regularization tricks that depend on old data.

Results & Findings

SettingBaseline (e.g., EWC, LwF)PLATE (rank‑tuned)Forgetting (ΔAcc)Avg. Accuracy
5‑task split CIFAR‑10071.2 %78.5 %–3.1 %75.8 %
Domain shift (ImageNet → Places)68.4 %74.9 %–2.0 %71.6 %
NLP continual fine‑tuning (BERT)82.1 %86.3 %–1.5 %84.2 %

Key take‑aways

  • Higher retention: PLATE consistently reduces catastrophic forgetting compared to classic regularization‑based methods, even though it never sees old‑task data.
  • Parameter efficiency: The adapter A typically adds < 2 % extra parameters per task, making it practical for on‑device or multi‑tenant deployments.
  • Robustness to rank choice: Experiments show a smooth trade‑off curve; modest ranks (e.g., 8–16) already capture most of the benefit, while higher ranks give diminishing returns.

Practical Implications

  • Fast, data‑light model upgrades – Companies can roll out new features (e.g., a new language domain or visual category) without having to retain massive historic datasets, simplifying compliance with privacy regulations.
  • Edge‑device continual learning – Because only a tiny adapter needs to be stored and updated, PLATE fits well on smartphones, IoT devices, or embedded systems that must learn from streaming data on‑the‑fly.
  • Multi‑tenant SaaS platforms – Service providers can maintain a single “base” model and spin off lightweight adapters per customer, reducing storage costs and isolation risks.
  • Simplified MLOps – The clear plasticity‑retention knob translates into a single hyperparameter (rank or fraction‑plastic) that can be tuned via automated pipelines, avoiding the complex replay‑buffer management of many existing CL solutions.

Limitations & Future Work

  • Assumption of redundancy – PLATE’s effectiveness hinges on the presence of geometric redundancy; extremely compact or heavily pruned models may offer fewer safe subspaces.
  • Static basesB and Q are computed once from the pretrained weights; if the base model evolves (e.g., via continual pre‑training), the adapters would need to be recomputed.
  • Task similarity bias – The method works best when new tasks share some underlying feature structure with the pre‑training distribution; highly divergent domains may still suffer noticeable drift.
  • Future directions – The authors suggest (1) dynamic updating of the bases during long‑term learning, (2) extending the approach to transformer attention matrices, and (3) exploring automated rank‑selection strategies based on validation‑set performance.

Authors

  • Romain Cosentino

Paper Information

  • arXiv ID: 2602.03846v1
  • Categories: cs.LG, cs.AI
  • Published: February 3, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »