[Paper] Particulate: Feed-Forward 3D Object Articulation

Published: 1 month ago (December 12, 2025 at 01:59 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.11798v1

Overview

Particulate is a new feed‑forward system that can take a single static 3D mesh—think of a CAD model or a scan of a chair—and instantly recover its hidden articulation: the separate moving parts, how they’re connected, and the motion limits of each joint. By replacing costly per‑object optimization with a single forward pass of a transformer‑based network, the method makes it practical to turn any static 3D asset into a fully rigged, animatable model in seconds.

Key Contributions

End‑to‑end transformer architecture (Part Articulation Transformer) that ingests a point cloud of a mesh and predicts parts, kinematic hierarchy, and joint constraints in one shot.
Native multi‑joint support, allowing objects with arbitrary numbers of moving links (e.g., a folding table with several hinges).
Large‑scale training on a curated collection of articulated assets from public datasets, plus a newly released benchmark for articulation estimation.
Real‑time inference: the whole pipeline runs in a few seconds on a single GPU, dramatically faster than prior optimization‑based methods.
Generalisation to AI‑generated 3D content, enabling a pipeline that goes from a single image → 3D mesh → articulated model using off‑the‑shelf image‑to‑3D generators.

Methodology

Input preprocessing – The static mesh is uniformly sampled into a point cloud (≈10k points).
Part Articulation Transformer – A hierarchical transformer processes the point cloud.
- Local feature extraction via self‑attention on small neighborhoods captures fine geometry (e.g., a door edge).
- Global reasoning aggregates these features to infer the overall kinematic graph (which part moves relative to which).
Joint prediction heads – Separate MLP heads output:
- Part segmentation (per‑point label).
- Parent‑child relationships (directed edges of the articulation tree).
- Joint type & limits (revolute, prismatic, range of motion).
Lifting to mesh – The predicted attributes are transferred back onto the original mesh, producing a rigged model ready for animation.
Training – Supervised learning with cross‑entropy for segmentation, binary cross‑entropy for adjacency, and regression losses for joint parameters. The loss is weighted to balance part count variability across objects.

The whole network is differentiable and trained end‑to‑end, so it learns to jointly optimise part detection and kinematic inference rather than treating them as separate steps.

Results & Findings

Quantitative boost: On the new benchmark (≈2k diverse objects), Particulate achieves a mean Intersection‑over‑Union (mIoU) of 84 % for part segmentation and a 71 % accuracy on correctly predicting the full kinematic tree—up to 20 % higher than the previous state‑of‑the‑art method (ArticulationNet).
Speed: Average inference time is 1.8 s per object on an RTX 3090, compared to 30 s–5 min for optimization‑based baselines.
Robustness to noise: The model maintains >75 % kinematic accuracy even when the input point cloud is perturbed with 5 % Gaussian noise, showing resilience to imperfect scans.
AI‑generated assets: When paired with a diffusion‑based image‑to‑3D generator, Particulate correctly extracts articulation for 68 % of objects created from a single RGB image, opening a path to “single‑image rigging.”

Qualitative examples (e.g., a kitchen cabinet, a folding chair, a robotic arm) demonstrate clean part separation and plausible joint limits that match human expectations.

Practical Implications

Game & AR/VR pipelines: Artists can import static meshes from libraries or procedural generators and instantly obtain rigged assets, cutting down manual rigging time dramatically.
Robotics simulation: Engineers can feed CAD models of household items into simulators (e.g., Isaac Gym, PyBullet) and obtain accurate joint models for manipulation planning without hand‑crafting URDF files.
3D content creation for e‑commerce: Retailers can upload a single product scan and automatically generate interactive 3D demos where customers can open drawers or rotate hinges.
Data augmentation for learning: Synthetic articulated models can be generated on‑the‑fly for training downstream tasks such as pose estimation or grasp synthesis.
Integration with image‑to‑3D tools: By chaining a diffusion‑based generator with Particulate, developers can build end‑to‑end “single‑photo to animated 3D” services, useful for rapid prototyping or virtual try‑ons.

Limitations & Future Work

Complex joint types: The current taxonomy covers revolute and prismatic joints; more exotic constraints (e.g., spherical or custom cams) are not yet supported.
Sparse topology errors: In objects with many tiny parts (e.g., a mechanical watch), the network sometimes merges adjacent components, leading to under‑segmented rigs.
Dependence on training distribution: Performance drops for highly stylised or non‑manifold meshes that are far from the public datasets used for training.
Future directions suggested by the authors include extending the joint vocabulary, incorporating physics‑based priors to refine motion limits, and exploring self‑supervised training on large unlabelled 3D repositories to improve generalisation to novel styles.

Authors

Ruining Li
Yuxin Yao
Chuanxia Zheng
Christian Rupprecht
Joan Lasenby
Shangzhe Wu
Andrea Vedaldi

Paper Information

arXiv ID: 2512.11798v1
Categories: cs.CV, cs.AI, cs.GR
Published: December 12, 2025
PDF: Download PDF

[Paper] Particulate: Feed-Forward 3D Object Articulation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Smudged Fingerprints: A Systematic Evaluation of the Robustness of AI Image Fingerprints

[Paper] DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry

[Paper] Parallax: Runtime Parallelization for Operator Fallbacks in Heterogeneous Edge Systems

[Paper] SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model