[Paper] PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding
Source: arXiv - 2606.06485v1
Overview
Recent advances in 3D multimodal large language models (3D‑MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D‑MLLMs remain largely object‑centric, limiting their ability to model fine‑grained part structures that are essential for embodied interaction with 3D environments. In this work, we present PAR3D, a unified part‑aware 3D‑MLLM framework that enables models to understand, reason about, and ground both objects and their parts in 3D scenes.
To enable training and evaluation of part‑aware 3D scene understanding, we introduce ScenePart, a synthetic 3D scene dataset with part‑level annotations and language instructions. We further develop Part‑Aware 3D Representation Learning to enrich 3D visual representations with fine‑grained part‑level semantics, and propose Hierarchical Segmentation Query Generation to ground part targets via hierarchical object‑part queries. Extensive experiments show that our method substantially improves part‑level question answering and referring segmentation, while also achieving strong performance across object‑level vision‑language tasks.
Key Contributions
- Introduces a part‑aware 3D‑MLLM framework (PAR3D).
- Provides the ScenePart dataset with part‑level annotations and language instructions.
- Develops part‑aware representation learning and hierarchical segmentation query generation.
- Demonstrates significant improvements on part‑level QA and referring segmentation, plus strong object‑level performance.
- Categorized under cs.CV.
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of computer vision (cs.CV) by enabling more fine‑grained interaction with 3D environments.
Authors
- Shaohui Dai
- Yansong Qu
- You Shen
- Shengchuan Zhang
- Liujuan Cao
Paper Information
- arXiv ID: 2606.06485v1
- Categories: cs.CV
- Published: June 4, 2026
- PDF: Download PDF