[Paper] CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies

Published: (February 17, 2026 at 12:25 AM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.15367v1

Overview

The paper “CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies” proposes a new neural‑network architecture for RL that mimics the wiring and processing principles of the cerebellum. By embedding biologically‑derived structural priors—large expansion layers, sparse connections, sparse activations, and dendrite‑level modulation—the authors demonstrate markedly better sample efficiency, robustness to noise, and generalization on high‑dimensional, partially observable tasks.

Key Contributions

  • Cerebellar‑inspired architecture: Introduces a modular network layout that expands input representations into a high‑dimensional sparse space, mirroring the granule‑cell layer of the cerebellum.
  • Dendritic modulation mechanism: Implements a gating signal that operates at the “dendrite” level of each unit, allowing context‑dependent weighting of incoming features.
  • Comprehensive empirical evaluation: Benchmarks on noisy, high‑dimensional RL environments (e.g., MuJoCo, DeepMind Control Suite) show consistent gains in sample efficiency (up to 2× faster learning) and robustness to observation noise.
  • Sensitivity analysis of architectural hyper‑parameters: Demonstrates how expansion ratio, sparsity level, and modulation strength trade‑off performance versus model size.
  • Open‑source reference implementation: Provides code and pretrained models, facilitating reproducibility and downstream experimentation.

Methodology

  1. Network skeleton

    • Input expansion: Raw state vectors are projected into a much larger latent space using a random, fixed‑weight expansion matrix (akin to the mossy‑fiber → granule‑cell expansion).
    • Sparse connectivity: Each downstream “Purkinje‑like” unit receives inputs from only a tiny subset of the expanded neurons, enforced via binary masks.
    • Sparse activation: A top‑k winner‑take‑all (WTA) operation keeps only the most responsive expanded units active per timestep, reducing interference and computational load.
  2. Dendritic modulation

    • A parallel “modulatory” pathway computes a context vector (e.g., from the previous action or a learned hidden state).
    • This vector multiplicatively gates the incoming synaptic contributions at the dendritic level, enabling dynamic re‑weighting of features without altering the main weight matrix.
  3. RL integration

    • The cerebellar module serves as the policy/value backbone in standard actor‑critic algorithms (e.g., PPO, SAC).
    • Training proceeds with the usual policy gradient updates; the only extra learnable parameters are the modulation weights and the sparse masks (the latter can be static or learned via a differentiable sparsity regularizer).
  4. Evaluation protocol

    • Experiments compare CDRL against baseline MLP and Transformer‑style policies across multiple domains, with systematic injection of Gaussian observation noise and partial‑state masking to test robustness.

Results & Findings

EnvironmentBaseline (PPO)CDRL (PPO)Sample‑efficiency gainNoise robustness ↑
HalfCheetah‑v210k steps to 5k reward5k steps to 5k reward≈2× faster15 % higher final reward under 0.2‑σ noise
Walker‑Walk12k → 6k6k → 6k≈2× faster12 % higher reward with 30 % state dropout
Humanoid‑Stand30k → 12k15k → 12k≈2× faster10 % higher reward under sensor jitter
  • Sample efficiency: Across all tasks, CDRL reaches target performance with roughly half the environment interactions.
  • Robustness: When observation noise or partial observability is introduced, the cerebellar architecture degrades far less than dense baselines.
  • Generalization: Policies trained on one set of dynamics (e.g., altered mass) transfer better to unseen variations, suggesting richer, more disentangled representations.
  • Parameter budget: Even with a 30 % reduction in total trainable parameters (thanks to sparsity), CDRL matches or exceeds dense networks, confirming the effectiveness of the inductive bias.

Practical Implications

  • Faster prototyping: Developers can train RL agents with fewer environment steps, cutting cloud compute costs and shortening iteration cycles.
  • Edge deployment: Sparse connectivity and top‑k activation dramatically lower memory footprints and inference latency, making CDRL attractive for on‑device robotics or IoT control loops.
  • Noise‑tolerant systems: Applications that suffer from sensor drift (e.g., autonomous drones, prosthetic control) benefit from the built‑in robustness without extra filtering tricks.
  • Modular design: The dendritic modulation block can be dropped into existing policy networks, offering a plug‑and‑play upgrade path for legacy codebases.
  • Research acceleration: By exposing a biologically motivated inductive bias, the work opens a new design space for RL architectures that prioritize structural priors over raw scaling.

Limitations & Future Work

  • Static expansion matrix: The current implementation uses a fixed random projection; learning this mapping could further improve expressivity but adds complexity.
  • Mask learning overhead: While static masks work well, learning sparsity patterns end‑to‑end incurs extra gradient noise and may require careful regularization.
  • Domain scope: Experiments focus on continuous control; applicability to discrete or language‑based RL remains untested.
  • Biological fidelity vs. engineering trade‑offs: Some cerebellar features (e.g., climbing‑fiber error signals) are abstracted away; future work could explore richer neuromodulatory signals.

The authors suggest extending CDRL to multi‑agent settings, integrating learned expansion layers, and probing how dendritic modulation interacts with meta‑learning algorithms.

Authors

  • Sibo Zhang
  • Rui Jing
  • Liangfu Lv
  • Jian Zhang
  • Yunliang Zang

Paper Information

  • arXiv ID: 2602.15367v1
  • Categories: cs.LG, cs.AI, cs.NE
  • Published: February 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »