[Paper] AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning
Source: arXiv - 2512.17853v1
Overview
AnyTask is a fully‑automated pipeline that uses massive GPU‑based simulation together with large‑scale foundation models (vision‑language models and large language models) to create thousands of diverse robot manipulation tasks, generate expert demonstrations, and train policies that can be transferred straight to real‑world robots. By removing the manual bottleneck of task design, scene generation, and data collection, the framework pushes generalist robot learning closer to the scale of modern AI systems.
Key Contributions
- End‑to‑end automation: A single framework that designs tasks, builds task‑aware scenes, synthesizes expert trajectories, and performs sim‑to‑real transfer without human‑in‑the‑loop engineering.
- ViPR (Vision‑Language‑in‑the‑Loop Planner): A novel task‑and‑motion‑planning agent that iteratively refines plans using a vision‑language model (VLM) to ensure feasibility and safety.
- ViPR‑Eureka: An RL agent that automatically constructs dense reward functions from LLM‑generated task descriptions and samples contact points guided by language cues.
- ViPR‑RL (Hybrid Planner‑Learner): A mixed planning‑and‑learning approach that produces high‑quality demonstrations even when only sparse rewards are available.
- Large‑scale data generation: Millions of simulated interactions across a wide variety of objects, poses, and task families (pick‑and‑place, drawer opening, contact‑rich pushing, long‑horizon sequences).
- Real‑world validation: Policies trained solely on synthetic data achieve 44 % average success on unseen real‑world tasks, demonstrating robust sim‑to‑real transfer.
Methodology
- Task Specification via LLMs – Natural‑language prompts describe a manipulation goal (e.g., “open the top drawer and place the red block inside”). An LLM expands this into a structured task graph (preconditions, goal states, constraints).
- Scene Generation – A VLM parses the task graph and populates a simulated environment with appropriate objects, randomizing poses, textures, and lighting to maximize diversity.
- Expert Demonstration Synthesis – Three agents work in parallel:
- ViPR runs a classical task‑and‑motion planner, then queries a VLM to verify each step (collision‑free, graspable) and iteratively refines the plan.
- ViPR‑Eureka builds a dense reward model from the LLM description and uses RL with contact‑sampling heuristics to discover high‑quality trajectories.
- ViPR‑RL combines sparse‑reward RL with occasional planner‑generated waypoints, allowing it to solve tasks where dense rewards are hard to define.
- Behavior Cloning – All generated trajectories are aggregated into a massive dataset. A transformer‑based policy network is trained to imitate the expert actions conditioned on visual observations.
- Sim‑to‑Real Transfer – Domain randomization (camera noise, friction variance, actuator delay) is applied during training. The resulting policy is deployed unchanged on a physical robot arm equipped with an RGB‑D camera.
Results & Findings
| Metric | Simulation | Real‑World (unseen tasks) |
|---|---|---|
| Success Rate (average across 10 task families) | 92 % | 44 % |
| Number of generated tasks | > 5 k distinct task definitions | – |
| Demonstrations per task (average) | 20–50 | – |
| Policy inference latency | ~30 ms on RTX 3090 | ~45 ms on embedded GPU |
- The policies generalize to novel object poses and even to objects not seen during simulation, thanks to the extensive visual and physical randomization.
- ViPR produces the highest‑fidelity trajectories (closest to human‑demonstrated plans), while ViPR‑Eureka excels on contact‑rich tasks where dense rewards are crucial.
- Hybrid ViPR‑RL bridges the gap, achieving comparable performance with far fewer environment interactions.
Practical Implications
- Rapid prototyping of robot skills – Engineers can describe a new manipulation goal in plain English and obtain a ready‑to‑run policy within hours, bypassing manual scene setup and data collection.
- Scalable data pipelines – Companies can leverage cloud GPU farms to generate petabytes of synthetic robot experience, feeding large‑scale foundation models for continual learning.
- Generalist robot platforms – The approach paves the way for “one‑size‑fits‑all” manipulators that can switch tasks on‑the‑fly, useful in warehousing, home assistance, and manufacturing where task variability is high.
- Reduced reliance on costly real‑world trials – By achieving >40 % success on unseen real tasks with zero real data, AnyTask cuts down the need for expensive tele‑operation or human‑demonstration campaigns.
- Open‑source extensibility – The modular agents (ViPR, ViPR‑Eureka, ViPR‑RL) can be swapped or combined with proprietary planners, allowing integration into existing robotics stacks.
Limitations & Future Work
- Success ceiling – While 44 % is impressive for zero‑real‑data policies, many industrial use‑cases still require >80 % reliability; further domain adaptation or few‑shot real fine‑tuning may be needed.
- Task complexity bound – The current LLM prompt parser handles tasks up to ~10 sequential steps; extremely long‑horizon or hierarchical tasks could overwhelm the planner.
- Simulation fidelity – Certain contact dynamics (e.g., soft‑object deformation) are still approximated, limiting transfer for highly deformable materials.
- Safety guarantees – The VLM‑in‑the‑loop verification reduces collisions but does not provide formal safety proofs; integrating motion‑planning safety certificates is a planned direction.
- Scalability of LLM/VLM calls – Massive parallel generation incurs high API costs; future work aims at on‑device distilled models to lower compute overhead.
Overall, AnyTask demonstrates that coupling massive simulation with foundation models can dramatically accelerate the creation of versatile robot manipulation policies, opening a practical path toward generalist, data‑hungry robot learning in the real world.
Authors
- Ran Gong
- Xiaohan Zhang
- Jinghuan Shang
- Maria Vittoria Minniti
- Jigarkumar Patel
- Valerio Pepe
- Riedana Yan
- Ahmet Gundogdu
- Ivan Kapelyukh
- Ali Abbas
- Xiaoqiang Yan
- Harsh Patel
- Laura Herlant
- Karl Schmeckpeper
Paper Information
- arXiv ID: 2512.17853v1
- Categories: cs.RO, cs.AI
- Published: December 19, 2025
- PDF: Download PDF