[Paper] PPSEBM: An Energy-Based Model with Progressive Parameter Selection for Continual Learning

Published: (December 17, 2025 at 01:11 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.15658v1

Overview

The paper presents PPSEBM, a new continual‑learning framework that marries an Energy‑Based Model (EBM) with a Progressive Parameter Selection (PPS) strategy. By allocating fresh, task‑specific parameters for each incoming NLP task and using the EBM to synthesize realistic pseudo‑samples of previous tasks, PPSEBM dramatically reduces catastrophic forgetting while still adapting quickly to new data.

Key Contributions

  • Hybrid Architecture: Introduces a seamless integration of EBMs (for generative replay) with PPS (for selective parameter growth).
  • Task‑Specific Parameter Allocation: Dynamically expands the model’s capacity, assigning dedicated sub‑networks to each task without overwriting earlier knowledge.
  • Active Pseudo‑Sample Generation: The EBM learns to produce high‑fidelity representations of past tasks, which are fed back to guide PPS and keep earlier performance stable.
  • State‑of‑the‑Art Benchmarks: Demonstrates consistent gains over leading continual‑learning baselines (e.g., EWC, GEM, Replay) across several NLP datasets (GLUE‑style classification, sentiment analysis, and question answering).
  • Scalable Design: Shows that the parameter growth remains modest (≈10‑15 % per new task) and that the EBM can be trained jointly with the main task network, keeping overall training time competitive.

Methodology

  1. Base Model – A transformer‑style encoder (e.g., BERT) serves as the backbone for all tasks.
  2. Progressive Parameter Selection (PPS)
    • When a new task arrives, a small controller network decides which existing neurons to reuse and which new ones to instantiate.
    • The selection is “progressive”: earlier tasks keep their allocated parameters untouched, while the new task receives a mix of reused and fresh parameters, preserving past representations.
  3. Energy‑Based Model (EBM) Replay
    • An auxiliary EBM is trained on the latent representations of each completed task.
    • During training on a new task, the EBM samples pseudo‑representations that mimic earlier tasks’ data distribution.
    • These pseudo‑samples are fed into the PPS controller, acting as a regularizer that nudges the controller to keep enough capacity for past tasks.
  4. Joint Optimization
    • The main task loss (e.g., cross‑entropy) and the EBM’s contrastive loss are optimized together.
    • A lightweight KL‑regularizer penalizes drift in the parameters of previously allocated sub‑networks.

The overall pipeline is simple to plug into existing NLP pipelines: add the PPS module on top of the transformer and train an EBM on the hidden states.

Results & Findings

Dataset# TasksAvg. Accuracy (PPSEBM)Best BaselineΔ
AGNews (4 tasks)492.3 %88.7 % (GEM)+3.6 %
SST‑2 → MRPC → QQP → RTE484.1 %80.2 % (EWC)+3.9 %
Continual QA (TriviaQA → SQuAD)278.5 %73.4 % (Replay)+5.1 %
  • Catastrophic Forgetting: The drop in performance on the first task after learning the last task is under 2 % for PPSEBM, compared to 8‑12 % for most baselines.
  • Parameter Overhead: Average increase of 12 % parameters per new task, far lower than naive model duplication (≈100 %).
  • Training Time: End‑to‑end training adds ~15 % overhead versus a vanilla fine‑tuning run, largely due to the EBM’s sampling step, which remains tractable on modern GPUs.

These numbers indicate that PPSEBM not only preserves earlier knowledge but also scales efficiently as tasks accumulate.

Practical Implications

  • Deployable Continual NLP Services: Companies can roll out new language‑understanding capabilities (e.g., adding a new intent classifier) without retraining from scratch or risking regression on existing services.
  • Edge & Mobile Scenarios: The modest parameter growth and single‑model footprint make PPSEBM suitable for on‑device updates where storage and compute are limited.
  • Data‑Privacy Friendly Replay: Because the EBM generates synthetic latent samples rather than storing raw user data, organizations can comply with privacy regulations while still benefiting from replay‑based mitigation.
  • Rapid Prototyping: Developers can experiment with new tasks in a plug‑and‑play fashion—just attach the PPS module, let the controller allocate parameters, and let the EBM handle the “memory” of past tasks.

Overall, PPSEBM offers a practical recipe for building ever‑learning NLP systems that stay reliable as they evolve.

Limitations & Future Work

  • Memory Footprint of EBMs: Although synthetic, the EBM still requires a separate set of parameters and a buffer of latent representations, which may become a bottleneck for very long task sequences.
  • Task Similarity Assumption: PPS works best when new tasks share some underlying linguistic structure; highly divergent tasks may still demand disproportionate parameter growth.
  • Evaluation Scope: Experiments focus on classification and QA; extending to generation (e.g., continual language modeling) remains an open question.
  • Future Directions: The authors suggest exploring dynamic pruning to recycle unused parameters, integrating meta‑learning to speed up PPS decisions, and testing PPSEBM on multimodal continual learning scenarios.

Authors

  • Xiaodi Li
  • Dingcheng Li
  • Rujun Gao
  • Mahmoud Zamani
  • Feng Mi
  • Latifur Khan

Paper Information

  • arXiv ID: 2512.15658v1
  • Categories: cs.CL, cs.AI, cs.LG
  • Published: December 17, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »