[Paper] Short-Term Synaptic Plasticity Stabilizes Goal-Conditioned Dynamics in a PFC-Inspired Reservoir Model for Multistep Goal-Directed Action Planning

Published: (June 2, 2026 at 06:59 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2606.03481v1

Overview

This paper investigates how short‑term synaptic plasticity (STP) can help a prefrontal‑cortex‑inspired recurrent network keep goal information “alive” long enough to guide a sequence of actions. By embedding STP into a reservoir‑computing model and testing it on a multistep, delayed‑execution planning task, the authors show that STP dramatically improves robustness to noise and preserves goal‑conditioned dynamics that are directly usable for action selection.

Key Contributions

  • STP‑augmented reservoir model: Integrated biologically plausible facilitation/depression mechanisms into a PFC‑style recurrent network.
  • Goal‑conditioned dynamics analysis: Demonstrated that STP maintains goal information as a dynamic, action‑usable pattern rather than a static, linearly decodable vector.
  • Noise robustness: Showed that the STP model’s success rate stays above 89 % even with substantial state noise, whereas a non‑STP counterpart drops below 50 %.
  • Effective connectivity insights: Identified time‑varying, goal‑specific recurrent connectivity patterns that emerge only when STP is present.
  • Parameter sweep: Mapped the region of STP time constants (favoring facilitation) that yields the highest planning performance.

Methodology

  1. Network architecture – A recurrent “reservoir” mimics the PFC, receiving a cue that encodes the current goal. The reservoir’s internal weights are fixed; only the readout layer (inspired by basal‑ganglia dopamine‑driven temporal‑difference learning) is trained to map internal states to action values.
  2. Short‑term plasticity – Each synapse follows the classic Tsodyks‑Markram model, with separate variables for neurotransmitter availability (depression) and release probability (facilitation). These variables evolve on the order of hundreds of milliseconds, matching experimental STP time scales.
  3. Task – The network must select a correct action sequence across three steps after a variable delay. The correct sequence depends on the initially presented goal, forcing the network to retain that goal information throughout the delay.
  4. Evaluation – 100 random reservoir instantiations were tested with and without STP, both under clean conditions and with injected Gaussian state noise. Decoding analyses (linear classifiers, state‑space separability) and effective‑connectivity estimations (Granger‑type causality) quantified how well goal information persisted.

Results & Findings

ConditionSuccess Rate (no noise)Success Rate (with noise)
No STP75.8 %49.5 %
With STP91.8 %89.2 %
  • Goal decodability: Both models encode the goal during the delay, but only the STP model retains a dynamic representation that can be read out at later decision points.
  • State‑space analysis: Trajectories for different goals stay well separated over time when STP is present; they collapse into overlapping clouds without STP.
  • Effective connectivity: With STP, recurrent connections become goal‑specific and increase in strength toward the end of the delay, providing a “pre‑activation” of the upcoming action plan.
  • Parameter sweep: Facilitation‑dominant STP (τ_f ≈ 200–400 ms, τ_d longer) yields the highest performance, suggesting that transient boosting of synaptic efficacy is the key stabilizer.

Practical Implications

  • Robust planning in noisy environments: Embedding STP into recurrent neural networks (RNNs) could make AI agents more resilient to sensor noise or internal perturbations, a common issue in robotics and autonomous systems.
  • Goal‑conditioned policy networks: Instead of training large transformer‑style models to remember goals, a lightweight reservoir with STP can maintain goal context over long horizons with minimal training overhead.
  • Neuromorphic hardware: STP is naturally implementable on memristive or spiking platforms; this work provides a concrete use‑case—stable, goal‑directed dynamics—for next‑generation low‑power AI chips.
  • Curriculum design for RL: The temporal‑difference readout used here mirrors actor‑critic updates; adding STP to the critic’s recurrent core may improve credit assignment when rewards are delayed.

Limitations & Future Work

  • Fixed reservoir weights: The study kept the recurrent matrix static, which limits adaptability to new tasks; future work could explore co‑learning of recurrent weights and STP parameters.
  • Simplified biology: The STP model captures only facilitation and depression; other forms of short‑term modulation (e.g., presynaptic inhibition) were not examined.
  • Task scope: The benchmark involves a relatively small action space and short delays; scaling to high‑dimensional, real‑world planning problems remains an open question.
  • Hardware validation: While the results are promising, implementing the exact STP dynamics on neuromorphic chips and measuring real‑world latency/energy benefits is left for later studies.

Authors

  • Jin Nakamura
  • Yuichi Katori

Paper Information

  • arXiv ID: 2606.03481v1
  • Categories: q-bio.NC, cs.NE
  • Published: June 2, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »