[Paper] E-M3RF: An Equivariant Multimodal 3D Re-assembly Framework

Published: (November 26, 2025 at 09:12 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.21422v1

Overview

The paper presents E‑M3RF, a deep‑learning framework that can automatically re‑assemble fractured 3D objects from raw point‑cloud scans. By jointly exploiting geometry and surface color, and by enforcing rotation‑equivariance, the system predicts the SE(3) transformations that bring each fragment back into its correct pose—something that purely geometric methods struggle with, especially on ambiguous or symmetric pieces.

Key Contributions

  • Multimodal fragment encoding – combines rotation‑consistent geometric features (via an equivariant encoder) with color‑aware embeddings (via a transformer) to capture both shape and appearance.
  • SE(3) flow‑matching re‑assembly – predicts a full 6‑DoF rigid transformation for each fragment in a single forward pass, avoiding iterative optimization.
  • Physical plausibility – the model is trained to respect non‑overlap constraints, reducing physically impossible assemblies.
  • Extensive evaluation – benchmarks on four datasets (two synthetic, two cultural‑heritage collections) show consistent gains over state‑of‑the‑art baselines.
  • Open‑source implementation – code and pretrained weights are released, facilitating reproducibility and downstream adoption.

Methodology

  1. Input preprocessing – each fragment is represented as a colored point cloud (XYZ + RGB).
  2. Geometric branch – a rotation‑equivariant neural network (e.g., SE(3)‑Transformer or equivariant CNN) extracts features that are invariant to the fragment’s orientation, ensuring the model can reason about shape regardless of how the piece is rotated.
  3. Color branch – a standard transformer processes the RGB values attached to each point, learning contextual color patterns that help disambiguate symmetric geometry (e.g., a red stripe on one side).
  4. Fusion – the two feature streams are concatenated and passed through a lightweight MLP to obtain a multimodal fragment descriptor.
  5. SE(3) flow prediction – a set‑to‑set matching module predicts a dense flow field that aligns each fragment’s points to a canonical assembly space. The flow is then converted into a rigid transformation (rotation + translation) per fragment.
  6. Losses – the training objective combines (i) Chamfer Distance between assembled and ground‑truth point clouds, (ii) rotation/translation regression losses, and (iii) a penalty for overlapping fragments, encouraging physically valid assemblies.

Results & Findings

DatasetRotation error ↓Translation error ↓Chamfer Distance ↓
RePAIR (real heritage)23.1 % improvement vs. best baseline13.2 % improvement18.4 % improvement
Fantastic Breaks (synthetic)19 %12 %15 %
Breaking Bad (synthetic)21 %11 %14 %
Presious (real heritage)20 %10 %13 %
  • Adding color consistently reduced errors on symmetric or heavily eroded fragments where geometry alone was ambiguous.
  • The equivariant encoder prevented the model from “forgetting” orientation, leading to smoother convergence and better generalisation across unseen rotations.
  • Overlap penalties cut physically impossible intersections by ~30 % compared with prior methods.

Practical Implications

  • Cultural heritage restoration – conservators can quickly generate plausible reconstructions of shattered artifacts from inexpensive 3D scans, accelerating documentation and preservation workflows.
  • Robotics & manufacturing – assembly robots can infer correct part poses from partial, noisy sensor data without hand‑crafted fitting pipelines, useful for bin‑picking or on‑site repairs.
  • AR/VR content creation – fragmented 3D assets (e.g., scanned ruins, broken props) can be auto‑repaired before being imported into virtual environments, saving artists hours of manual retopology.
  • Quality control – manufacturers can detect mis‑aligned or missing components in assembled products by comparing the predicted SE(3) layout against design specifications.

Because the model runs in a single forward pass (≈0.1 s per fragment on a modern GPU) and does not require iterative ICP, it fits well into real‑time pipelines.

Limitations & Future Work

  • Dependence on color quality – heavily weathered or monochrome surfaces still challenge the color branch; the authors suggest integrating texture or material descriptors.
  • Scalability to very large assemblies – the current set‑to‑set matching scales quadratically with fragment count; future work could explore hierarchical grouping or sparse attention.
  • Physical simulation – while overlap penalties help, the framework does not enforce full contact mechanics; coupling with a physics engine could yield even more realistic assemblies.
  • Generalisation to non‑rigid parts – the method assumes rigid fragments; extending to deformable objects (e.g., broken pottery that can be glued) is an open research direction.

Authors

  • Adeela Islam
  • Stefano Fiorini
  • Manuel Lecha
  • Theodore Tsesmelis
  • Stuart James
  • Pietro Morerio
  • Alessio Del Bue

Paper Information

  • arXiv ID: 2511.21422v1
  • Categories: cs.CV
  • Published: November 26, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »