[Paper] Merge and Bound: Direct Manipulations on Weights for Class Incremental Learning

Published: 2 months ago (November 26, 2025 at 10:24 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.21490v1

Overview

The paper introduces Merge‑and‑Bound (M & B), a new training recipe for class‑incremental learning (CIL) that works directly on the model’s weights instead of tweaking loss functions or network architectures. By carefully merging and bounding weight updates, the method dramatically cuts down catastrophic forgetting while staying compatible with any existing CIL pipeline.

Key Contributions

Weight‑space merging: Two novel merging operations – inter‑task (averaging across all previously learned tasks) and intra‑task (combining multiple checkpoints within the current task) – that reshape the model without architectural changes.
Bounded update rule: A principled constraint that forces the new model to stay close to the merged “reference” weights, minimizing cumulative drift and preserving prior knowledge.
Plug‑and‑play design: M & B can be dropped into any CIL method (e.g., iCaRL, LUCIR, PODNet) without altering the loss, replay buffer, or network head.
State‑of‑the‑art results: Consistently outperforms recent CIL baselines on CIFAR‑100, ImageNet‑Subset, and TinyImageNet, often by 2–5 % absolute accuracy.
Comprehensive analysis: Ablation studies that isolate the impact of each merging component and demonstrate robustness to different replay sizes and task orders.

Methodology

Inter‑task weight merging – After finishing task t‑1, the algorithm stores the model’s parameters. When task t begins, it computes a simple average of all stored checkpoints (including the current one). This “global” weight vector serves as a knowledge anchor that embodies what the network has learned so far.
Intra‑task weight merging – During training on task t, several intermediate snapshots (e.g., after each epoch) are collected. These are merged (again by averaging) to produce a task‑specific representation that smooths out noisy updates.
Bounded update – The actual optimization step is constrained by a quadratic penalty that limits the distance between the updated weights and the merged anchor. Concretely, the loss becomes:

[ \mathcal{L}{\text{total}} = \mathcal{L}{\text{CIL}} + \lambda | \theta - \theta_{\text{merged}} |_2^2, ]

where (\theta) are the current parameters and (\theta_{\text{merged}}) is the result of the two merges. The hyper‑parameter (\lambda) controls how “tight” the bound is.
Integration – Because the extra term is just a regularizer on the weight vector, it can be added to any existing CIL loss (cross‑entropy, distillation, contrastive, etc.) without touching the model’s architecture or replay buffer.

Results & Findings

Dataset	Baseline (e.g., LUCIR)	LUCIR + M & B	Gain
CIFAR‑100 (20 tasks)	63.2 %	68.1 %	+4.9 %
ImageNet‑Subset (10 tasks)	71.5 %	74.8 %	+3.3 %
TinyImageNet (10 tasks)	55.0 %	58.9 %	+3.9 %

Reduced forgetting: The average drop in accuracy for the first task after learning all tasks shrank from ~30 % to ~18 % when M & B was applied.
Stability across replay sizes: Even with a tiny replay buffer (1 % of the dataset), M & B still delivered a >3 % boost, showing that the weight‑space regularizer does not rely on large exemplars.
Ablation: Removing intra‑task merging caused a ~1.2 % drop; removing the bounded term caused a ~2.5 % drop, confirming that both components are essential.

Overall, the experiments demonstrate that staying “close” to a merged weight representation is an effective, low‑overhead way to keep old knowledge alive.

Practical Implications

Easy adoption: Developers can add a few lines of code (store checkpoints, compute an average, add the regularizer) to any CIL framework they already use. No new layers, memory‑intensive rehearsal, or custom optimizers are required.
Lower compute & memory footprint: Because the method works on the parameter vector itself, it avoids expensive generative replay or large exemplar buffers, making it attractive for edge devices or on‑device continual learning.
Robustness to task order: The merging strategy is agnostic to how tasks are presented, which is valuable for real‑world pipelines where data arrives in unpredictable sequences (e.g., incremental product catalogs, evolving sensor modalities).
Potential beyond CIL: The bounded‑update idea could be repurposed for other continual‑learning scenarios such as domain adaptation, federated learning, or even fine‑tuning large language models where preserving a “core” representation is critical.

Limitations & Future Work

Merging simplicity: The current approach uses plain averaging; more sophisticated merging (e.g., weighted averages based on task difficulty or confidence) might yield further gains.
Scalability to very large models: Storing full checkpoints for every task can become memory‑intensive for models with hundreds of millions of parameters; the authors suggest exploring low‑rank or sketch‑based representations.
Theoretical guarantees: While empirical results are strong, a formal analysis of why the bounded update mitigates forgetting is left for future research.
Extension to non‑classification tasks: The paper focuses on image classification; applying M & B to detection, segmentation, or multimodal tasks remains an open avenue.

Authors

Taehoon Kim
Donghwan Jang
Bohyung Han

Paper Information

arXiv ID: 2511.21490v1
Categories: cs.CV, cs.AI, cs.LG
Published: November 26, 2025
PDF: Download PDF

[Paper] Merge and Bound: Direct Manipulations on Weights for Class Incremental Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Physics-Informed Neural Networks for Thermophysical Property Retrieval

[Paper] Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach

[Paper] TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos

[Paper] G$^2$VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning