[Paper] AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning

Published: 1 month ago (December 19, 2025 at 12:55 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.17853v1

Overview

AnyTask is a fully‑automated pipeline that uses massive GPU‑based simulation together with large‑scale foundation models (vision‑language models and large language models) to create thousands of diverse robot manipulation tasks, generate expert demonstrations, and train policies that can be transferred straight to real‑world robots. By removing the manual bottleneck of task design, scene generation, and data collection, the framework pushes generalist robot learning closer to the scale of modern AI systems.

Key Contributions

End‑to‑end automation: A single framework that designs tasks, builds task‑aware scenes, synthesizes expert trajectories, and performs sim‑to‑real transfer without human‑in‑the‑loop engineering.
ViPR (Vision‑Language‑in‑the‑Loop Planner): A novel task‑and‑motion‑planning agent that iteratively refines plans using a vision‑language model (VLM) to ensure feasibility and safety.
ViPR‑Eureka: An RL agent that automatically constructs dense reward functions from LLM‑generated task descriptions and samples contact points guided by language cues.
ViPR‑RL (Hybrid Planner‑Learner): A mixed planning‑and‑learning approach that produces high‑quality demonstrations even when only sparse rewards are available.
Large‑scale data generation: Millions of simulated interactions across a wide variety of objects, poses, and task families (pick‑and‑place, drawer opening, contact‑rich pushing, long‑horizon sequences).
Real‑world validation: Policies trained solely on synthetic data achieve 44 % average success on unseen real‑world tasks, demonstrating robust sim‑to‑real transfer.

Methodology

Task Specification via LLMs – Natural‑language prompts describe a manipulation goal (e.g., “open the top drawer and place the red block inside”). An LLM expands this into a structured task graph (preconditions, goal states, constraints).
Scene Generation – A VLM parses the task graph and populates a simulated environment with appropriate objects, randomizing poses, textures, and lighting to maximize diversity.
Expert Demonstration Synthesis – Three agents work in parallel:
- ViPR runs a classical task‑and‑motion planner, then queries a VLM to verify each step (collision‑free, graspable) and iteratively refines the plan.
- ViPR‑Eureka builds a dense reward model from the LLM description and uses RL with contact‑sampling heuristics to discover high‑quality trajectories.
- ViPR‑RL combines sparse‑reward RL with occasional planner‑generated waypoints, allowing it to solve tasks where dense rewards are hard to define.
Behavior Cloning – All generated trajectories are aggregated into a massive dataset. A transformer‑based policy network is trained to imitate the expert actions conditioned on visual observations.
Sim‑to‑Real Transfer – Domain randomization (camera noise, friction variance, actuator delay) is applied during training. The resulting policy is deployed unchanged on a physical robot arm equipped with an RGB‑D camera.

Results & Findings

Metric	Simulation	Real‑World (unseen tasks)
Success Rate (average across 10 task families)	92 %	44 %
Number of generated tasks	> 5 k distinct task definitions	–
Demonstrations per task (average)	20–50	–
Policy inference latency	~30 ms on RTX 3090	~45 ms on embedded GPU

The policies generalize to novel object poses and even to objects not seen during simulation, thanks to the extensive visual and physical randomization.
ViPR produces the highest‑fidelity trajectories (closest to human‑demonstrated plans), while ViPR‑Eureka excels on contact‑rich tasks where dense rewards are crucial.
Hybrid ViPR‑RL bridges the gap, achieving comparable performance with far fewer environment interactions.

Practical Implications

Rapid prototyping of robot skills – Engineers can describe a new manipulation goal in plain English and obtain a ready‑to‑run policy within hours, bypassing manual scene setup and data collection.
Scalable data pipelines – Companies can leverage cloud GPU farms to generate petabytes of synthetic robot experience, feeding large‑scale foundation models for continual learning.
Generalist robot platforms – The approach paves the way for “one‑size‑fits‑all” manipulators that can switch tasks on‑the‑fly, useful in warehousing, home assistance, and manufacturing where task variability is high.
Reduced reliance on costly real‑world trials – By achieving >40 % success on unseen real tasks with zero real data, AnyTask cuts down the need for expensive tele‑operation or human‑demonstration campaigns.
Open‑source extensibility – The modular agents (ViPR, ViPR‑Eureka, ViPR‑RL) can be swapped or combined with proprietary planners, allowing integration into existing robotics stacks.

Limitations & Future Work

Success ceiling – While 44 % is impressive for zero‑real‑data policies, many industrial use‑cases still require >80 % reliability; further domain adaptation or few‑shot real fine‑tuning may be needed.
Task complexity bound – The current LLM prompt parser handles tasks up to ~10 sequential steps; extremely long‑horizon or hierarchical tasks could overwhelm the planner.
Simulation fidelity – Certain contact dynamics (e.g., soft‑object deformation) are still approximated, limiting transfer for highly deformable materials.
Safety guarantees – The VLM‑in‑the‑loop verification reduces collisions but does not provide formal safety proofs; integrating motion‑planning safety certificates is a planned direction.
Scalability of LLM/VLM calls – Massive parallel generation incurs high API costs; future work aims at on‑device distilled models to lower compute overhead.

Overall, AnyTask demonstrates that coupling massive simulation with foundation models can dramatically accelerate the creation of versatile robot manipulation policies, opening a practical path toward generalist, data‑hungry robot learning in the real world.

Authors

Ran Gong
Xiaohan Zhang
Jinghuan Shang
Maria Vittoria Minniti
Jigarkumar Patel
Valerio Pepe
Riedana Yan
Ahmet Gundogdu
Ivan Kapelyukh
Ali Abbas
Xiaoqiang Yan
Harsh Patel
Laura Herlant
Karl Schmeckpeper

Paper Information

arXiv ID: 2512.17853v1
Categories: cs.RO, cs.AI
Published: December 19, 2025
PDF: Download PDF

[Paper] AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting

[Paper] Adversarial Robustness of Vision in Open Foundation Models

[Paper] When Reasoning Meets Its Laws

[Paper] Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy