[Paper] MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction
Source: arXiv - 2603.19231v1
Overview
Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression unstable. Existing methods address this challenge through multi-view supervision, retrieval-based assembly, or auxiliary video generation, often sacrificing scalability or efficiency. We present MonoArt, a unified framework grounded in progressive structural reasoning. Rather than predicting articulation directly from image features, MonoArt progressively transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture. This structured reasoning process enables stable and interpretable articulation inference without external motion templates or multi-stage pipelines. Extensive experiments on PartNet-Mobility demonstrate that OM achieves state-of-the-art performance in both reconstruction accuracy and inference speed. The framework further generalizes to robotic manipulation and articulated scene reconstruction.
Key Contributions
This paper presents research in the following areas:
- cs.CV
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.CV.
Authors
- Haitian Li
- Haozhe Xie
- Junxiang Xu
- Beichen Wen
- Fangzhou Hong
- Ziwei Liu
Paper Information
- arXiv ID: 2603.19231v1
- Categories: cs.CV
- Published: March 19, 2026
- PDF: Download PDF