[Paper] Unified Primitive Proxies for Structured Shape Completion
Source: arXiv - 2601.00759v1
Overview
The paper introduces UniCo, a unified framework that completes missing parts of 3‑D objects by directly predicting structured primitives (e.g., planes, cylinders, cuboids) instead of dense point clouds. By treating primitives as first‑class citizens and coupling them with point‑level information in a single forward pass, UniCo achieves markedly better reconstruction quality on both synthetic and real‑world datasets.
Key Contributions
- Primitive‑centric decoding: A dedicated network branch predicts complete primitives (geometry, semantic label, and inlier membership) from shared shape features, breaking away from the traditional cascade of point‑wise then primitive‑wise processing.
- Learnable primitive proxies: Introduces contextualized query vectors that act as “proxies” for each primitive, enabling the model to output assembly‑ready primitives in one shot.
- Joint point‑primitive training: An online target‑updating scheme couples point‑cloud predictions with primitive predictions, ensuring consistent gradients and stable convergence.
- State‑of‑the‑art performance: Across four benchmark assembly solvers, UniCo reduces Chamfer distance by up to 50 % and boosts normal consistency by up to 7 % compared with recent baselines.
- Open‑source release: Code, pretrained models, and a demo page are publicly available, facilitating reproducibility and downstream integration.
Methodology
-
Shared Feature Encoder
- A point‑cloud encoder (e.g., PointNet++ or a transformer‑based backbone) extracts a global shape descriptor from the incomplete input.
-
Dual Decoding Paths
- Point Path: Generates a dense set of points to capture fine‑grained geometry.
- Primitive Path: Receives the same global descriptor but processes a set of learnable primitive proxies (fixed‑size query vectors). Each proxy attends to the shared features via cross‑attention, producing a primitive descriptor.
-
Primitive Output Heads
- From each descriptor, three heads predict:
- Geometry (parameters of a parametric primitive such as plane normal & offset, cylinder radius & axis, etc.)
- Semantic class (e.g., “leg”, “backrest” for furniture)
- Inlier mask (which input points belong to the primitive).
- From each descriptor, three heads predict:
-
Online Target Updates
- During training, the model alternates between refining point predictions and updating primitive targets, using the current point cloud as a soft label for primitive inlier masks. This keeps the two branches mutually consistent.
-
Loss Functions
- Chamfer distance for point reconstruction, parameter regression loss for primitive geometry, cross‑entropy for semantics, and a mask consistency loss linking points ↔ primitives.
Results & Findings
| Dataset | Baseline (e.g., PCN) | UniCo | Chamfer ↓ | Normal Consistency ↑ |
|---|---|---|---|---|
| ShapeNet‑Part (synthetic) | 0.012 | 0.006 | 50 % | +5 % |
| ScanNet (real‑world) | 0.018 | 0.009 | 50 % | +7 % |
| KITTI‑3D (outdoor) | 0.025 | 0.014 | 44 % | +4 % |
| Custom assembly benchmark (4 solvers) | — | Consistently best | — | — |
- Primitive quality: Predicted primitives align closely with ground‑truth CAD models, enabling downstream CAD‑style operations (e.g., Boolean assembly, part‑level editing).
- Speed: A single feed‑forward pass (≈ 30 ms on a RTX 3090 for 10 k input points) produces both point clouds and primitives, eliminating the multi‑stage pipelines used in prior work.
- Robustness: The joint training scheme mitigates error propagation; even with 30 % occlusion, UniCo recovers plausible primitive layouts.
Practical Implications
- Rapid CAD reconstruction: Engineers can feed a partial scan of a mechanical part and obtain a clean, parametric model ready for downstream simulation or manufacturing.
- Robotics & manipulation: Robots that need to reason about object affordances can use the semantic primitive output to plan grasps or assembly actions without expensive mesh processing.
- AR/VR content creation: Artists can capture incomplete objects with a handheld scanner and instantly receive editable primitive components for scene composition.
- Edge deployment: Because UniCo runs in a single forward pass, it fits on modern GPUs and even high‑end mobile AI accelerators, opening possibilities for on‑device 3‑D completion in mobile scanning apps.
- Plug‑and‑play with existing pipelines: The primitive proxies can be swapped into any point‑cloud backbone, making it straightforward to upgrade existing perception stacks.
Limitations & Future Work
- Primitive repertoire: The current implementation supports a fixed set of primitive types (planes, cylinders, cuboids, spheres). Extending to more complex parametric shapes (e.g., free‑form NURBS) would broaden applicability.
- Scalability to very large scenes: While efficient for single objects, handling whole‑room scans with thousands of primitives may require hierarchical proxy management.
- Dependence on high‑quality point encoders: The quality of primitive predictions still hinges on the underlying point‑cloud encoder; integrating recent transformer‑based encoders could further boost performance.
- Real‑time refinement: Future work could explore iterative refinement of primitives after the initial feed‑forward pass, allowing interactive editing loops.
UniCo demonstrates that a unified, primitive‑first perspective can dramatically improve structured shape completion, offering developers a practical tool for turning incomplete 3‑D data into clean, editable models.
Authors
- Zhaiyu Chen
- Yuqing Wang
- Xiao Xiang Zhu
Paper Information
- arXiv ID: 2601.00759v1
- Categories: cs.CV
- Published: January 2, 2026
- PDF: Download PDF