[Paper] MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Published: (March 19, 2026 at 01:59 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2603.19231v1

Overview

Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression unstable. Existing methods address this challenge through multi-view supervision, retrieval-based assembly, or auxiliary video generation, often sacrificing scalability or efficiency. We present MonoArt, a unified framework grounded in progressive structural reasoning. Rather than predicting articulation directly from image features, MonoArt progressively transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture. This structured reasoning process enables stable and interpretable articulation inference without external motion templates or multi-stage pipelines. Extensive experiments on PartNet-Mobility demonstrate that OM achieves state-of-the-art performance in both reconstruction accuracy and inference speed. The framework further generalizes to robotic manipulation and articulated scene reconstruction.

Key Contributions

This paper presents research in the following areas:

  • cs.CV

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CV.

Authors

  • Haitian Li
  • Haozhe Xie
  • Junxiang Xu
  • Beichen Wen
  • Fangzhou Hong
  • Ziwei Liu

Paper Information

  • arXiv ID: 2603.19231v1
  • Categories: cs.CV
  • Published: March 19, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Matryoshka Gaussian Splatting

The ability to render scenes at adjustable fidelity from a single model, known as level of detail (LoD), is crucial for practical deployment of 3D Gaussian Spla...