[Paper] PAR3D: A Unified 3D-MLLM with Part-Aware Representation for Scene Understanding

Published: (June 4, 2026 at 01:59 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.06485v1

Overview

Recent advances in 3D multimodal large language models (3D‑MLLMs) have enabled unified solutions for 3D scene understanding tasks, including visual question answering, captioning, and referring segmentation. However, existing 3D‑MLLMs remain largely object‑centric, limiting their ability to model fine‑grained part structures that are essential for embodied interaction with 3D environments. In this work, we present PAR3D, a unified part‑aware 3D‑MLLM framework that enables models to understand, reason about, and ground both objects and their parts in 3D scenes.

To enable training and evaluation of part‑aware 3D scene understanding, we introduce ScenePart, a synthetic 3D scene dataset with part‑level annotations and language instructions. We further develop Part‑Aware 3D Representation Learning to enrich 3D visual representations with fine‑grained part‑level semantics, and propose Hierarchical Segmentation Query Generation to ground part targets via hierarchical object‑part queries. Extensive experiments show that our method substantially improves part‑level question answering and referring segmentation, while also achieving strong performance across object‑level vision‑language tasks.

Key Contributions

  • Introduces a part‑aware 3D‑MLLM framework (PAR3D).
  • Provides the ScenePart dataset with part‑level annotations and language instructions.
  • Develops part‑aware representation learning and hierarchical segmentation query generation.
  • Demonstrates significant improvements on part‑level QA and referring segmentation, plus strong object‑level performance.
  • Categorized under cs.CV.

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of computer vision (cs.CV) by enabling more fine‑grained interaction with 3D environments.

Authors

  • Shaohui Dai
  • Yansong Qu
  • You Shen
  • Shengchuan Zhang
  • Liujuan Cao

Paper Information

  • arXiv ID: 2606.06485v1
  • Categories: cs.CV
  • Published: June 4, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »