[Paper] Particulate: Feed-Forward 3D Object Articulation

Published: (December 12, 2025 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.11798v1

Overview

Particulate is a new feed‑forward system that can take a single static 3D mesh—think of a CAD model or a scan of a chair—and instantly recover its hidden articulation: the separate moving parts, how they’re connected, and the motion limits of each joint. By replacing costly per‑object optimization with a single forward pass of a transformer‑based network, the method makes it practical to turn any static 3D asset into a fully rigged, animatable model in seconds.

Key Contributions

  • End‑to‑end transformer architecture (Part Articulation Transformer) that ingests a point cloud of a mesh and predicts parts, kinematic hierarchy, and joint constraints in one shot.
  • Native multi‑joint support, allowing objects with arbitrary numbers of moving links (e.g., a folding table with several hinges).
  • Large‑scale training on a curated collection of articulated assets from public datasets, plus a newly released benchmark for articulation estimation.
  • Real‑time inference: the whole pipeline runs in a few seconds on a single GPU, dramatically faster than prior optimization‑based methods.
  • Generalisation to AI‑generated 3D content, enabling a pipeline that goes from a single image → 3D mesh → articulated model using off‑the‑shelf image‑to‑3D generators.

Methodology

  1. Input preprocessing – The static mesh is uniformly sampled into a point cloud (≈10k points).
  2. Part Articulation Transformer – A hierarchical transformer processes the point cloud.
    • Local feature extraction via self‑attention on small neighborhoods captures fine geometry (e.g., a door edge).
    • Global reasoning aggregates these features to infer the overall kinematic graph (which part moves relative to which).
  3. Joint prediction heads – Separate MLP heads output:
    • Part segmentation (per‑point label).
    • Parent‑child relationships (directed edges of the articulation tree).
    • Joint type & limits (revolute, prismatic, range of motion).
  4. Lifting to mesh – The predicted attributes are transferred back onto the original mesh, producing a rigged model ready for animation.
  5. Training – Supervised learning with cross‑entropy for segmentation, binary cross‑entropy for adjacency, and regression losses for joint parameters. The loss is weighted to balance part count variability across objects.

The whole network is differentiable and trained end‑to‑end, so it learns to jointly optimise part detection and kinematic inference rather than treating them as separate steps.

Results & Findings

  • Quantitative boost: On the new benchmark (≈2k diverse objects), Particulate achieves a mean Intersection‑over‑Union (mIoU) of 84 % for part segmentation and a 71 % accuracy on correctly predicting the full kinematic tree—up to 20 % higher than the previous state‑of‑the‑art method (ArticulationNet).
  • Speed: Average inference time is 1.8 s per object on an RTX 3090, compared to 30 s–5 min for optimization‑based baselines.
  • Robustness to noise: The model maintains >75 % kinematic accuracy even when the input point cloud is perturbed with 5 % Gaussian noise, showing resilience to imperfect scans.
  • AI‑generated assets: When paired with a diffusion‑based image‑to‑3D generator, Particulate correctly extracts articulation for 68 % of objects created from a single RGB image, opening a path to “single‑image rigging.”

Qualitative examples (e.g., a kitchen cabinet, a folding chair, a robotic arm) demonstrate clean part separation and plausible joint limits that match human expectations.

Practical Implications

  • Game & AR/VR pipelines: Artists can import static meshes from libraries or procedural generators and instantly obtain rigged assets, cutting down manual rigging time dramatically.
  • Robotics simulation: Engineers can feed CAD models of household items into simulators (e.g., Isaac Gym, PyBullet) and obtain accurate joint models for manipulation planning without hand‑crafting URDF files.
  • 3D content creation for e‑commerce: Retailers can upload a single product scan and automatically generate interactive 3D demos where customers can open drawers or rotate hinges.
  • Data augmentation for learning: Synthetic articulated models can be generated on‑the‑fly for training downstream tasks such as pose estimation or grasp synthesis.
  • Integration with image‑to‑3D tools: By chaining a diffusion‑based generator with Particulate, developers can build end‑to‑end “single‑photo to animated 3D” services, useful for rapid prototyping or virtual try‑ons.

Limitations & Future Work

  • Complex joint types: The current taxonomy covers revolute and prismatic joints; more exotic constraints (e.g., spherical or custom cams) are not yet supported.
  • Sparse topology errors: In objects with many tiny parts (e.g., a mechanical watch), the network sometimes merges adjacent components, leading to under‑segmented rigs.
  • Dependence on training distribution: Performance drops for highly stylised or non‑manifold meshes that are far from the public datasets used for training.
  • Future directions suggested by the authors include extending the joint vocabulary, incorporating physics‑based priors to refine motion limits, and exploring self‑supervised training on large unlabelled 3D repositories to improve generalisation to novel styles.

Authors

  • Ruining Li
  • Yuxin Yao
  • Chuanxia Zheng
  • Christian Rupprecht
  • Joan Lasenby
  • Shangzhe Wu
  • Andrea Vedaldi

Paper Information

  • arXiv ID: 2512.11798v1
  • Categories: cs.CV, cs.AI, cs.GR
  • Published: December 12, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »