[Paper] CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies

Published: (February 17, 2026 at 12:25 AM EST)
5 min read
Source: arXiv

Source: arXiv

Source: arXiv:2602.15367v1

Overview

The paper “CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies” proposes a new neural‑network architecture for reinforcement learning that mimics the wiring and processing principles of the cerebellum. By embedding biologically‑derived structural priors—large expansion layers, sparse connections, sparse activations, and dendrite‑level modulation—the authors demonstrate:

  • Markedly better sample efficiency
  • Improved robustness to noise
  • Enhanced generalization on high‑dimensional, partially observable tasks.

Key Contributions

  • Cerebellar‑inspired architecture
    Introduces a modular network layout that expands input representations into a high‑dimensional sparse space, mirroring the granule‑cell layer of the cerebellum.

  • Dendritic modulation mechanism
    Implements a gating signal that operates at the “dendrite” level of each unit, allowing context‑dependent weighting of incoming features.

  • Comprehensive empirical evaluation
    Benchmarks on noisy, high‑dimensional RL environments (e.g., MuJoCo, DeepMind Control Suite) show consistent gains in sample efficiency (up to 2× faster learning) and robustness to observation noise.

  • Sensitivity analysis of architectural hyper‑parameters
    Demonstrates how expansion ratio, sparsity level, and modulation strength trade off performance versus model size.

  • Open‑source reference implementation
    Provides code and pretrained models, facilitating reproducibility and downstream experimentation.

Methodology

1. Network Skeleton

  • Input expansion – Raw state vectors are projected into a much larger latent space using a random, fixed‑weight expansion matrix (akin to the mossy‑fiber → granule‑cell expansion).
  • Sparse connectivity – Each downstream “Purkinje‑like” unit receives inputs from only a tiny subset of the expanded neurons, enforced via binary masks.
  • Sparse activation – A top‑k winner‑take‑all (WTA) operation keeps only the most responsive expanded units active per timestep, reducing interference and computational load.

2. Dendritic Modulation

  • A parallel “modulatory” pathway computes a context vector (e.g., from the previous action or a learned hidden state).
  • This vector multiplicatively gates the incoming synaptic contributions at the dendritic level, enabling dynamic re‑weighting of features without altering the main weight matrix.

3. RL Integration

  • The cerebellar module serves as the policy/value backbone in standard actor‑critic algorithms (e.g., PPO, SAC).
  • Training proceeds with the usual policy‑gradient updates; the only extra learnable parameters are the modulation weights and the sparse masks (the latter can be static or learned via a differentiable sparsity regularizer).

4. Evaluation Protocol

  • Experiments compare CDRL against baseline MLP and Transformer‑style policies across multiple domains.
  • Systematic injection of Gaussian observation noise and partial‑state masking is used to test robustness.

Results & Findings

EnvironmentBaseline (PPO)CDRL (PPO)Sample‑efficiency gainNoise‑robustness ↑
HalfCheetah‑v210 k steps → 5 k reward5 k steps → 5 k reward≈ 2× faster+15 % final reward at σ = 0.2
Walker‑v212 k steps → 6 k reward6 k steps → 6 k reward≈ 2× faster+12 % reward with 30 % state dropout
Humanoid‑Stand30 k steps → 12 k reward15 k steps → 12 k reward≈ 2× faster+10 % reward under sensor jitter
  • Sample efficiency – CDRL reaches the target performance with roughly half the environment interactions across all tasks.
  • Robustness – When observation noise or partial observability is introduced, the cerebellar architecture degrades far less than dense baselines.
  • Generalization – Policies trained on one set of dynamics (e.g., altered mass) transfer better to unseen variations, indicating richer, more disentangled representations.
  • Parameter budget – Even with a 30 % reduction in total trainable parameters (thanks to sparsity), CDRL matches or exceeds dense networks, confirming the effectiveness of the inductive bias.

Practical Implications

  • Faster prototyping – Train RL agents with fewer environment steps, reducing cloud‑compute costs and shortening iteration cycles.
  • Edge deployment – Sparse connectivity and top‑k activation dramatically lower memory footprints and inference latency, making CDRL attractive for on‑device robotics or IoT control loops.
  • Noise‑tolerant systems – Applications that suffer from sensor drift (e.g., autonomous drones, prosthetic control) benefit from built‑in robustness without extra filtering tricks.
  • Modular design – The dendritic modulation block can be dropped into existing policy networks, offering a plug‑and‑play upgrade path for legacy codebases.
  • Research acceleration – By exposing a biologically motivated inductive bias, the work opens a new design space for RL architectures that prioritize structural priors over raw scaling.

Limitations & Future Work

  • Static expansion matrix

    • The current implementation relies on a fixed random projection.
    • Learning this mapping could improve expressivity, but it adds complexity.
  • Mask‑learning overhead

    • Static masks perform well, yet learning sparsity patterns end‑to‑end introduces extra gradient noise.
    • Careful regularization may be required.
  • Domain scope

    • Experiments are limited to continuous‑control tasks.
    • Applicability to discrete or language‑based reinforcement learning remains untested.
  • Biological fidelity vs. engineering trade‑offs

    • Certain cerebellar features (e.g., climbing‑fiber error signals) are abstracted away.
    • Future work could explore richer neuromodulatory signals.

Suggested Directions

  • Extend CDRL to multi‑agent environments.
  • Integrate learned expansion layers instead of fixed random projections.
  • Investigate how dendritic modulation interacts with meta‑learning algorithms.

Authors

  • Sibo Zhang
  • Rui Jing
  • Liangfu Lv
  • Jian Zhang
  • Yunliang Zang

Paper Information

ItemDetails
arXiv ID2602.15367v1
Categoriescs.LG, cs.AI, cs.NE
PublishedFebruary 17, 2026
PDFDownload PDF
0 views
Back to Blog

Related posts

Read more »

Does AI have a hero gene?

Emergent Collaborative Recovery in Multi‑Agent Teams This is a two‑part series about the architecture and events surrounding an extraordinary moment when an AI...