[Paper] NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning
Source: arXiv - 2604.27031v1
Overview
Continual learning (CL) systems must stay plastic enough to pick up new tasks while remaining stable so they don’t forget what they’ve already mastered. The authors argue that this trade‑off is rooted in the network’s architecture: a fixed‑size model is forced to guess the future size of its “brain”—an oracle it never actually has. Their paper introduces NORACL, a neurogenesis‑inspired method that lets a neural network grow on demand, automatically adding neurons only when the current capacity is saturated.
Key Contributions
- Neurogenesis‑driven growth: A lightweight mechanism that monitors representational and plasticity saturation and expands the network only when needed.
- Oracle‑free design: Starts from a compact base model, eliminating the need to pre‑size the architecture for an unknown task stream.
- Task‑aware growth patterns: Shows that dissimilar tasks trigger expansion in early feature‑extraction layers, while related tasks push growth toward later combination layers, yielding interpretable architectures.
- Empirical superiority: Across diverse continual‑learning benchmarks (varying number of tasks and feature overlap), NORACL matches or exceeds the final accuracy of static, oracle‑sized baselines while using fewer parameters.
- Theoretical insight: Provides analysis of why fixed‑capacity networks lose plasticity over time and how fresh capacity restores it, moving the stability‑plasticity Pareto frontier forward.
Methodology
- Base network – Begin with a small, conventional feed‑forward (or CNN) model.
- Saturation signals – Two complementary metrics are tracked during training:
- Representational saturation: measures how much of the current feature space is already occupied.
- Plasticity saturation: gauges the gradient‑based learning capacity (e.g., diminishing weight updates).
- Growth trigger – When either metric crosses a preset threshold, a neurogenesis step is executed:
- New neurons (and associated weights) are added to a selected layer.
- The layer’s dimensionality is increased, and the optimizer is re‑initialized for the new parameters.
- Task‑specific routing – Newly added neurons are preferentially used by the current task, while earlier neurons stay available for past tasks, preserving stability.
- Continual learning loop – The process repeats for each incoming task, with no need to know the total number of tasks or their similarity ahead of time.
The approach is deliberately simple: it plugs into existing training pipelines and works with common CL regularizers (e.g., EWC, SI) without requiring complex replay buffers or architectural rewiring.
Results & Findings
| Setting | # Tasks | Task Overlap | Oracle Static Params | NORACL Params | Final Avg. Accuracy |
|---|---|---|---|---|---|
| Low overlap, many tasks | 20 | 0.2 | 12 M | 9 M | +2.3 % over oracle |
| High overlap, few tasks | 5 | 0.9 | 8 M | 6 M | ≈ same as oracle |
| Mixed geometry | 12 | 0.5 | 10 M | 7 M | +0.8 % |
- Parameter efficiency: NORACL consistently uses 15‑30 % fewer weights than the best static baseline while delivering equal or better accuracy.
- Interpretability: Visualization of growth shows early‑layer expansion for tasks with novel visual patterns, and later‑layer expansion when tasks share low‑level features but differ in high‑level composition.
- Stability: Forgetting rates drop dramatically after each growth event because previously learned representations remain untouched.
- Scalability: The growth overhead is modest (≈ 5 % extra compute per added neuron) and does not require retraining the whole network.
Practical Implications
- Dynamic services: Edge devices or SaaS platforms that must ingest new data streams (e.g., personalized recommendation, anomaly detection) can start with a tiny model and let it expand only as needed, saving memory and inference latency.
- Resource‑constrained deployment: Since growth is incremental, developers can set hard caps on total parameters, guaranteeing that the model never exceeds device limits.
- Simplified pipeline: No need for costly architecture search or manual sizing; NORACL can be dropped into existing PyTorch/TensorFlow CL codebases.
- Interpretability for debugging: The layer‑wise growth map offers a quick visual cue about which parts of the network are handling novel versus shared features, aiding model inspection and troubleshooting.
- Future‑proofing: As new tasks emerge (e.g., new sensor modalities), the system can autonomously allocate capacity, reducing the need for frequent model re‑engineering cycles.
Limitations & Future Work
- Growth thresholds are hyper‑parameters that still require tuning per domain; an automated, data‑driven schedule would improve usability.
- The current experiments focus on image‑classification style benchmarks; extending to sequence models (e.g., NLP, time‑series) may need layer‑type‑specific neurogenesis rules.
- While parameter count is reduced, training time can increase slightly due to repeated re‑initialization and optimizer state updates.
- The method assumes a single‑task‑at‑a‑time stream; handling simultaneous multi‑task updates or continual reinforcement‑learning settings remains open.
Overall, NORACL demonstrates that adaptive neurogenesis is a practical, architecture‑agnostic tool for pushing continual‑learning systems toward the ideal balance of stability and plasticity—without the need for an oracle that predicts the future.
Authors
- Karthik Charan Raghunathan
- Christian Metzner
- Laura Kriener
- Melika Payvand
Paper Information
- arXiv ID: 2604.27031v1
- Categories: cs.LG, cs.AI, cs.NE
- Published: April 29, 2026
- PDF: Download PDF