[Paper] Degradation of Feature Space in Continual Learning

Published: 2 months ago (February 6, 2026 at 05:26 AM EST)

5 min read

Source: arXiv

Source: arXiv

Overview

The paper Degradation of Feature Space in Continual Learning challenges a common assumption from classic deep‑learning pipelines: that forcing feature representations to be isotropic (i.e., equally spread in all directions) always improves model robustness. By probing this idea in the context of continual learning—where data arrives as a non‑stationary stream—the authors discover that isotropy can actually hurt performance, revealing a fundamental geometric mismatch between centralized and incremental training regimes.

Key Contributions

Empirical investigation of feature‑space isotropy in continual learning, a setting where most prior work focuses on plasticity‑stability trade‑offs but not on representation geometry.
Contrastive continual‑learning experiments on CIFAR‑10 and CIFAR‑100 that compare vanilla continual learning, isotropy‑regularized variants, and baseline centralized training.
Evidence that isotropy regularization degrades accuracy in streaming scenarios, contrary to its proven benefits in static, centralized training.
Insightful analysis of anisotropy emergence as an intrinsic by‑product of incremental updates, suggesting that anisotropic features may be a useful inductive bias for non‑stationary data.
Guidelines for future algorithm design, warning researchers against blindly transplanting centralized‑training tricks into continual‑learning pipelines.

Methodology

Continual‑learning backbone – The authors adopt a standard class‑incremental learning protocol: a ResNet‑18 model is trained on a sequence of tasks derived from CIFAR‑10/100, each task introducing new classes while retaining access only to the current task’s data.
Contrastive regularization – To encourage isotropy, they augment the loss with a feature‑space isotropy term that penalizes deviations from a spherical covariance matrix (similar to the “whitening” or “uniformity” losses used in contrastive learning).
Baselines – Three setups are compared:
a. vanilla continual learning (no isotropy term)
b. isotropy‑regularized continual learning
c. a centralized model trained on the full dataset (the gold‑standard for isotropy benefits).
Metrics – Accuracy after each incremental step, as well as geometric diagnostics (eigenvalue spread of the feature covariance, cosine similarity distribution) are recorded to quantify isotropy vs. anisotropy.
Ablation – The strength of the isotropy regularizer is swept across several values to ensure the observed effect isn’t a hyper‑parameter artifact.

Results & Findings

Setting	Final Accuracy (CIFAR‑10)	Final Accuracy (CIFAR‑100)	Feature‑Covariance Eigen‑Spread
Centralized (full data)	92.1 %	71.4 %	Near‑uniform (low spread)
Vanilla Continual	78.3 %	48.9 %	Moderate anisotropy
Isotropy‑Regularized	76.5 % (↓ 1.8)	46.2 % (↓ 2.7)	Forced uniformity (high regularizer loss)

Accuracy drops when isotropy is enforced, even though the regularizer successfully flattens the eigenvalue distribution.
Anisotropic features naturally emerge as the model adapts to new tasks; this anisotropy correlates with better retention of earlier knowledge.
Contrastive loss alone (without isotropy) improves representation quality, but the added isotropy term negates those gains.

Takeaway: Making the feature space “spherical” harms the delicate balance between learning new information (plasticity) and preserving old knowledge (stability) in a streaming environment.

Practical Implications

Avoid copying centralized tricks – Techniques such as batch‑norm whitening, uniformity losses, or explicit isotropy regularizers that are common in static training can be counter‑productive for on‑device or edge continual‑learning systems.
Design anisotropy‑aware architectures – When building models for lifelong learning (e.g., robotics, personalized assistants, autonomous vehicles), prefer regularizers that preserve or adapt the natural anisotropy rather than suppress it.
Monitor feature geometry – Simple diagnostics—e.g., eigenvalue spread or cosine‑similarity histograms—can be added to training pipelines to flag when a model becomes overly isotropic, providing an early warning for potential forgetting.
Consider resource constraints – Isotropy regularization adds extra computation (covariance estimation, additional loss terms) without clear benefit; omitting it can save memory and FLOPs on embedded devices.

Takeaway:
The work encourages the community to adopt geometry‑conscious continual‑learning designs, treating the shape of the representation space as a first‑class hyper‑parameter.

Limitations & Future Work

Limitations

Dataset scope – Experiments are confined to CIFAR‑10/100; larger‑scale or domain‑specific streams (e.g., video, language) may exhibit different dynamics.
Single backbone – Only ResNet‑18 is evaluated; other architectures (Vision Transformers, recurrent nets) could respond differently to isotropy constraints.
Regularizer formulation – The study employs a straightforward isotropy penalty; more sophisticated approaches (e.g., task‑aware covariance shaping) might mitigate the observed degradation.
Theoretical grounding – Empirical evidence is strong, yet a formal analysis linking anisotropy to the plasticity‑stability trade‑off remains an open research direction.

Future Work

Develop adaptive regularizers that learn the optimal degree of isotropy for each task.
Investigate how anisotropic feature spaces interact with replay‑based and parameter‑isolation continual‑learning strategies.
Extend experiments to diverse datasets and model families to validate the generality of the findings.

Authors

Eduard Angelats
Paolo Dini
Chiara Lanza
Marco Miozzo
Roberto Pereira

Paper Information

Item	Details
arXiv ID	`2602.06586v1`
Categories	`cs.LG, cs.DC`
Published	February 6, 2026
PDF	Download PDF

[Paper] Degradation of Feature Space in Continual Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Limitations

Future Work

Authors

Paper Information

Related posts

AI News Roundup: ChatGPT Ads Testing, the AI Super Bowl, and India’s Sovereign Models

OpenAI's new Codex app hits 1M+ downloads in first week — but limits may be coming to free and Go users

Imagen 4 vs Ideogram vs SD3.5: Which Image Model Fits Your Product Roadmap?

AI News Roundup: Ads in ChatGPT, Discord age checks, and GitHub agentic workflows