[Paper] CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies

Published: 2 months ago (February 17, 2026 at 12:25 AM EST)

5 min read

Source: arXiv

Source: arXiv

Overview

The paper “CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies” proposes a new neural‑network architecture for reinforcement learning that mimics the wiring and processing principles of the cerebellum. By embedding biologically‑derived structural priors—large expansion layers, sparse connections, sparse activations, and dendrite‑level modulation—the authors demonstrate:

Markedly better sample efficiency
Improved robustness to noise
Enhanced generalization on high‑dimensional, partially observable tasks.

Key Contributions

Cerebellar‑inspired architecture
Introduces a modular network layout that expands input representations into a high‑dimensional sparse space, mirroring the granule‑cell layer of the cerebellum.
Dendritic modulation mechanism
Implements a gating signal that operates at the “dendrite” level of each unit, allowing context‑dependent weighting of incoming features.
Comprehensive empirical evaluation
Benchmarks on noisy, high‑dimensional RL environments (e.g., MuJoCo, DeepMind Control Suite) show consistent gains in sample efficiency (up to 2× faster learning) and robustness to observation noise.
Sensitivity analysis of architectural hyper‑parameters
Demonstrates how expansion ratio, sparsity level, and modulation strength trade off performance versus model size.
Open‑source reference implementation
Provides code and pretrained models, facilitating reproducibility and downstream experimentation.

Methodology

1. Network Skeleton

Input expansion – Raw state vectors are projected into a much larger latent space using a random, fixed‑weight expansion matrix (akin to the mossy‑fiber → granule‑cell expansion).
Sparse connectivity – Each downstream “Purkinje‑like” unit receives inputs from only a tiny subset of the expanded neurons, enforced via binary masks.
Sparse activation – A top‑k winner‑take‑all (WTA) operation keeps only the most responsive expanded units active per timestep, reducing interference and computational load.

2. Dendritic Modulation

A parallel “modulatory” pathway computes a context vector (e.g., from the previous action or a learned hidden state).
This vector multiplicatively gates the incoming synaptic contributions at the dendritic level, enabling dynamic re‑weighting of features without altering the main weight matrix.

3. RL Integration

The cerebellar module serves as the policy/value backbone in standard actor‑critic algorithms (e.g., PPO, SAC).
Training proceeds with the usual policy‑gradient updates; the only extra learnable parameters are the modulation weights and the sparse masks (the latter can be static or learned via a differentiable sparsity regularizer).

4. Evaluation Protocol

Experiments compare CDRL against baseline MLP and Transformer‑style policies across multiple domains.
Systematic injection of Gaussian observation noise and partial‑state masking is used to test robustness.

Results & Findings

Environment	Baseline (PPO)	CDRL (PPO)	Sample‑efficiency gain	Noise‑robustness ↑
HalfCheetah‑v2	10 k steps → 5 k reward	5 k steps → 5 k reward	≈ 2× faster	+15 % final reward at σ = 0.2
Walker‑v2	12 k steps → 6 k reward	6 k steps → 6 k reward	≈ 2× faster	+12 % reward with 30 % state dropout
Humanoid‑Stand	30 k steps → 12 k reward	15 k steps → 12 k reward	≈ 2× faster	+10 % reward under sensor jitter

Sample efficiency – CDRL reaches the target performance with roughly half the environment interactions across all tasks.
Robustness – When observation noise or partial observability is introduced, the cerebellar architecture degrades far less than dense baselines.
Generalization – Policies trained on one set of dynamics (e.g., altered mass) transfer better to unseen variations, indicating richer, more disentangled representations.
Parameter budget – Even with a 30 % reduction in total trainable parameters (thanks to sparsity), CDRL matches or exceeds dense networks, confirming the effectiveness of the inductive bias.

Practical Implications

Faster prototyping – Train RL agents with fewer environment steps, reducing cloud‑compute costs and shortening iteration cycles.
Edge deployment – Sparse connectivity and top‑k activation dramatically lower memory footprints and inference latency, making CDRL attractive for on‑device robotics or IoT control loops.
Noise‑tolerant systems – Applications that suffer from sensor drift (e.g., autonomous drones, prosthetic control) benefit from built‑in robustness without extra filtering tricks.
Modular design – The dendritic modulation block can be dropped into existing policy networks, offering a plug‑and‑play upgrade path for legacy codebases.
Research acceleration – By exposing a biologically motivated inductive bias, the work opens a new design space for RL architectures that prioritize structural priors over raw scaling.

Limitations & Future Work

Static expansion matrix
- The current implementation relies on a fixed random projection.
- Learning this mapping could improve expressivity, but it adds complexity.
Mask‑learning overhead
- Static masks perform well, yet learning sparsity patterns end‑to‑end introduces extra gradient noise.
- Careful regularization may be required.
Domain scope
- Experiments are limited to continuous‑control tasks.
- Applicability to discrete or language‑based reinforcement learning remains untested.
Biological fidelity vs. engineering trade‑offs
- Certain cerebellar features (e.g., climbing‑fiber error signals) are abstracted away.
- Future work could explore richer neuromodulatory signals.

Suggested Directions

Extend CDRL to multi‑agent environments.
Integrate learned expansion layers instead of fixed random projections.
Investigate how dendritic modulation interacts with meta‑learning algorithms.

Authors

Sibo Zhang
Rui Jing
Liangfu Lv
Jian Zhang
Yunliang Zang

Paper Information

Item	Details
arXiv ID	`2602.15367v1`
Categories	`cs.LG`, `cs.AI`, `cs.NE`
Published	February 17, 2026
PDF	Download PDF