[Paper] $χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies
Source: arXiv - 2602.09021v1
Overview
The paper introduces χ₀, a resource‑efficient framework that dramatically improves the robustness of long‑horizon robotic manipulation—especially tasks like garment folding that involve many sequential steps. By tackling the hidden “distributional inconsistencies” between training demonstrations, learned policies, and real‑world execution, the authors achieve near‑continuous autonomous operation with far fewer data and compute resources than prior work.
Key Contributions
- Model Arithmetic – a novel weight‑space merging technique that fuses multiple demonstration policies, letting a single model absorb diverse visual and state variations.
- Stage Advantage – a stage‑aware advantage estimator that supplies dense, stable learning signals for each sub‑task, avoiding the instability of generic advantage methods.
- Train‑Deploy Alignment – a three‑pronged alignment pipeline (spatio‑temporal augmentation, heuristic DAgger corrections, and chunk‑wise temporal smoothing) that bridges the gap between training and test‑time distributions.
- Real‑world validation – dual‑arm robots autonomously perform garment flattening, folding, and hanging for 24 hours straight, achieving a ~250 % boost in success rate over the previous state‑of‑the‑art π₀.₅ while using only 20 hours of data and 8 A100 GPUs.
Methodology
- Data Collection & Demonstrations – The team gathers a modest set of human‑demonstrated trajectories covering variations in cloth type, lighting, and initial poses.
- Model Arithmetic – Instead of training a monolithic policy from scratch, they first train several specialist models on subsets of the data (e.g., different garment textures). The final policy is obtained by arithmetically merging the weight tensors of these specialists, effectively creating a single network that retains the strengths of each.
- Stage‑Aware Advantage Estimation – Long‑horizon tasks are broken into explicit stages (flatten → fold → hang). For each stage, a dedicated advantage function evaluates progress, providing dense rewards that guide the policy even when early‑stage errors would otherwise cause sparse feedback.
- Train‑Deploy Alignment –
- Spatio‑temporal augmentation synthetically perturbs observations (camera jitter, cloth drape) to mimic deployment noise.
- Heuristic DAgger runs the partially trained policy, collects its mistakes, and asks a lightweight corrective oracle to label the correct action, iteratively refining the model.
- Temporal chunk smoothing post‑processes the policy’s output to enforce smooth transitions between stages, reducing jerky motions that could destabilize the cloth.
The combined pipeline yields a single policy that can be deployed on commodity dual‑arm platforms without the massive compute budgets typical of large‑scale imitation‑learning pipelines.
Results & Findings
| Metric | χ₀ (this work) | π₀.₅ (baseline) |
|---|---|---|
| Success rate (full garment pipeline) | ~85 % | ~30 % |
| Data required | 20 h of demonstrations | >100 h |
| Compute (GPU‑hours) | 8 A100 GPUs × 20 h | 8 A100 GPUs × 100 h |
| Continuous autonomous runtime | 24 h non‑stop | ~2 h (frequent resets) |
Key takeaways:
- Model Arithmetic alone recovers ~70 % of the performance gap caused by distributional shift.
- Stage Advantage eliminates the exploding/vanishing gradient issues seen in prior advantage‑based RL for long tasks.
- The full Train‑Deploy Alignment pipeline adds the final boost needed to reach production‑level reliability.
Practical Implications
- Lower entry barrier for robotic startups – Teams can now build robust manipulation pipelines with a few hours of data and a modest GPU cluster, rather than investing in massive data farms.
- Rapid prototyping of new tasks – By training specialist models on a handful of demonstrations and merging them via Model Arithmetic, developers can quickly extend existing policies to new objects or environments.
- Improved safety and uptime – The dense stage‑wise feedback reduces catastrophic failures, making long‑running autonomous services (e.g., laundry folding stations, warehouse sorting) feasible.
- Transferable framework – Although demonstrated on garment manipulation, the three pillars are generic and can be applied to other multi‑stage domains such as assembly, kitchen robotics, or even autonomous driving maneuvers.
Limitations & Future Work
- Domain specificity – The current experiments focus on dual‑arm cloth handling; performance on highly rigid or highly deformable objects remains untested.
- Heuristic DAgger reliance – The corrective oracle is hand‑crafted; automating this step (e.g., via learned critics) could further reduce human involvement.
- Scalability of Model Arithmetic – Merging many specialist models may encounter diminishing returns or weight‑conflict issues; exploring more principled Bayesian merging could be a next step.
The authors plan to open‑source their code, datasets, and pretrained models, inviting the community to extend χ₀ to broader manipulation challenges.
Authors
- Checheng Yu
- Chonghao Sima
- Gangcheng Jiang
- Hai Zhang
- Haoguang Mai
- Hongyang Li
- Huijie Wang
- Jin Chen
- Kaiyang Wu
- Li Chen
- Lirui Zhao
- Modi Shi
- Ping Luo
- Qingwen Bu
- Shijia Peng
- Tianyu Li
- Yibo Yuan
Paper Information
- arXiv ID: 2602.09021v1
- Categories: cs.RO, cs.CV
- Published: February 9, 2026
- PDF: Download PDF