[Paper] ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts

Published: 2 months ago (November 28, 2025 at 01:35 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.23442v1

Overview

Offline reinforcement learning (RL) promises to turn static datasets into high‑performing policies without costly online interaction. The paper “ASTRO: Adaptive Stitching via Dynamics‑Guided Trajectory Rollouts” tackles a core obstacle: real‑world datasets are often riddled with sub‑optimal, fragmented trajectories that make it hard for an agent to infer the true value of states and actions. ASTRO introduces a novel data‑augmentation pipeline that stitches together dynamics‑consistent trajectory fragments, enabling offline RL agents to learn more effectively from imperfect data.

Key Contributions

Temporal‑distance representation: Learns a latent metric that quantifies how “far apart” two states are in terms of reachable steps, allowing the system to pick stitch‑compatible start‑and‑goal pairs.
Dynamics‑guided stitch planner: Generates connecting action sequences by iteratively correcting rollouts with a Rollout Deviation Feedback signal, ensuring the stitched trajectory respects the true environment dynamics.
Distributionally novel augmentations: Unlike prior generative‑model approaches that stay within the behavior policy’s support, ASTRO creates trajectories that explore new state‑action regions while remaining physically plausible.
Algorithm‑agnostic augmentation: Works with a variety of offline RL algorithms (e.g., CQL, IQL, TD3‑BC) and consistently improves their performance.
Strong empirical gains: Sets new state‑of‑the‑art results on the OGBench benchmark suite and delivers consistent lifts on the widely used D4RL tasks.

Methodology

Learning a temporal‑distance encoder
- A neural network is trained to predict the number of steps needed to go from state s₁ to state s₂ under the environment’s dynamics.
- The resulting embedding space clusters states that are reachable within a similar horizon, making it easy to locate promising stitch targets.
Selecting stitch pairs
- For any trajectory fragment, ASTRO queries the embedding to find a target fragment whose start state lies within a reachable distance but offers higher cumulative reward.
Dynamics‑guided stitching via Rollout Deviation Feedback (RDF)
- A provisional action sequence is generated (e.g., by a learned dynamics model or a simple planner).
- The sequence is executed in a simulated rollout; the resulting state trajectory is compared to the desired target trajectory.
- The deviation (difference) is fed back to the planner, which adjusts the actions iteratively until the rollout aligns closely with the target while obeying the learned dynamics.
Augmented dataset construction
- The stitched, dynamics‑consistent trajectories are added to the original offline dataset.
- Standard offline RL algorithms are then trained on this enriched dataset, benefiting from longer, higher‑quality trajectories.

The whole pipeline is fully differentiable and can be plugged into existing offline RL pipelines with minimal engineering effort.

Results & Findings

Benchmark	Baseline (e.g., CQL)	CQL + ASTRO	Improvement
D4RL HalfCheetah‑v2	94.2	101.8	+7.6
D4RL Walker2d‑medium	95.5	103.1	+7.6
OGBench (graph‑based control)	68.4	78.9	+10.5

Consistent gains across multiple offline RL algorithms (CQL, IQL, TD3‑BC).
Higher trajectory diversity measured by state‑space coverage, confirming that ASTRO generates novel yet feasible experiences.
Ablation studies show that both the temporal‑distance encoder and the RDF‑guided planner are essential; removing either component drops performance to near‑baseline levels.

Practical Implications

Faster policy bootstrapping: Developers can take existing logs (e.g., from robotics, autonomous driving, or recommendation systems) and dramatically improve offline RL performance without additional data collection.
Safer exploration: Because stitched trajectories respect learned dynamics, the resulting policies are less likely to propose unsafe actions when later deployed online.
Plug‑and‑play augmentation: ASTRO is model‑agnostic; teams can integrate it into their current offline RL pipelines (PyTorch, JAX, etc.) with a few lines of code.
Reduced reliance on high‑quality data: Even datasets dominated by sub‑optimal behavior can be turned into a valuable training resource, lowering the barrier for RL adoption in industry settings where perfect demonstrations are rare.

Limitations & Future Work

Dynamics model fidelity: ASTRO’s success hinges on the accuracy of the learned dynamics model; in highly stochastic or partially observable environments, rollout deviation feedback may struggle.
Computational overhead: The iterative RDF planning adds runtime compared to naïve data augmentation, which could be a bottleneck for massive datasets.
Scalability to high‑dimensional action spaces: While experiments cover standard continuous control, extending the approach to very high‑dimensional or discrete action domains (e.g., large‑scale recommendation) remains an open challenge.

Future research directions suggested by the authors include:

Incorporating uncertainty estimates into the dynamics model to better handle stochasticity.
Exploring hierarchical stitching where multi‑step macro‑actions are composed.
Applying ASTRO to real‑world robotic systems to validate safety and sample‑efficiency gains in the field.

Authors

Hang Yu
Di Zhang
Qiwei Du
Yanping Zhao
Hai Zhang
Guang Chen
Eduardo E. Veas
Junqiao Zhao

Paper Information

arXiv ID: 2511.23442v1
Categories: cs.LG, cs.AI
Published: November 28, 2025
PDF: Download PDF

[Paper] ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

[Paper] ThetaEvolve: Test-time Learning on Open Problems

[Paper] The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

[Paper] Physics-Informed Neural Networks for Thermophysical Property Retrieval