[Paper] Neu-PiG: Neural Preconditioned Grids for Fast Dynamic Surface Reconstruction on Long Sequences
Source: arXiv - 2602.22212v1
Overview
Neu‑PiG introduces a neural preconditioned grid that can reconstruct temporally consistent 3‑D surfaces from long, unstructured point‑cloud sequences in a matter of seconds. By combining a multi‑resolution latent grid with a lightweight MLP, the method sidesteps costly correspondence searches and category‑specific training while still delivering drift‑free, high‑fidelity results.
Key Contributions
- Preconditioned latent‑grid encoding that stores spatial features on a reference surface using both position and normal direction.
- Multi‑resolution representation that captures coarse‑to‑fine deformations across the entire sequence in a single latent space.
- Sobolev‑based gradient preconditioning that stabilizes training, eliminates drift, and removes the need for explicit correspondences.
- Fast, training‑free pipeline: reconstruction of long sequences (thousands of frames) runs > 60× faster than prior optimization‑only methods.
- Comparable inference speed to heavyweight pretrained models while remaining category‑agnostic (no per‑class training required).
Methodology
- Keyframe selection – A single frame is chosen as a geometric reference; its surface provides a coordinate system of positions and normals.
- Latent grid construction – A 3‑D grid is laid over the reference surface. Each cell stores a feature vector that is preconditioned using Sobolev norms, encouraging smooth spatial variation.
- Time modulation – A low‑dimensional time code is concatenated to the spatial features, allowing the same grid to represent deformations at any frame.
- Decoding – A tiny MLP (≈2–3 hidden layers) maps the combined spatial‑temporal feature to a 6‑DoF rigid transform for each point, effectively “warping” the reference surface into the target frame.
- Optimization – Gradient descent minimizes a point‑to‑point distance loss between the warped reference and the raw input clouds. Sobolev preconditioning reshapes the loss landscape, making convergence fast and stable without any correspondence priors.
The whole pipeline is end‑to‑end differentiable and runs on a single GPU.
Results & Findings
- Accuracy: Neu‑PiG achieves lower Chamfer and Hausdorff distances than the current state‑of‑the‑art (e.g., Neural Dynamic Surfaces, DynamicFusion) on human motion capture and animal locomotion datasets.
- Speed: Reconstruction of a 2 000‑frame sequence finishes in ≈30 s, a > 60× speed‑up over the best training‑free baseline (which needs > 30 min).
- Scalability: The multi‑resolution grid handles sequences up to 10 000 frames without memory blow‑up, thanks to the hierarchical feature layout.
- Robustness: No drift is observed over long runs, even when the input clouds are noisy or partially occluded, because the Sobolev preconditioner enforces smoothness across time.
Practical Implications
- Real‑time capture pipelines – Developers can integrate Neu‑PiG into live scanning setups (e.g., AR/VR body tracking) where fast, drift‑free surface updates are critical.
- Animation & VFX – Artists can generate high‑quality dynamic meshes from raw sensor data without hand‑crafting correspondences or training per‑character models.
- Robotics & Motion Analysis – Fast reconstruction enables on‑the‑fly shape understanding for manipulation or gait analysis, reducing latency in feedback loops.
- Cloud‑based services – The lightweight MLP and grid representation are cheap to serialize and transmit, making it feasible to run the optimization server‑side and stream results to thin clients.
Limitations & Future Work
- Reference dependence – The quality of the reconstruction hinges on the chosen keyframe; a poorly captured reference can limit deformation expressiveness.
- Memory scaling of the grid – Although hierarchical, extremely high‑resolution grids may still strain GPU memory for ultra‑dense point clouds.
- Non‑rigid local deformations – The current decoder outputs only global 6‑DoF transforms per point; modeling fine‑grained non‑rigid deformations (e.g., skin wrinkles) would require an extended decoder.
- Future directions – The authors suggest adaptive keyframe selection, learned hierarchical decoders for local elasticity, and integration with differentiable rendering pipelines to close the loop with visual feedback.
Authors
- Julian Kaltheuner
- Hannah Dröge
- Markus Plack
- Patrick Stotko
- Reinhard Klein
Paper Information
- arXiv ID: 2602.22212v1
- Categories: cs.CV
- Published: February 25, 2026
- PDF: Download PDF