[Paper] GeoRect4D: Geometry-Compatible Generative Rectification for Dynamic Sparse-View 3D Reconstruction

Published: 1 day ago (April 22, 2026 at 01:12 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2604.20784v1

Overview

The paper GeoRect4D tackles one of the toughest problems in computer vision: rebuilding a moving 3‑D scene from only a handful of video cameras. Traditional pipelines either collapse the geometry or produce “floating” artifacts when the views are sparse. GeoRect4D bridges the gap between deterministic 3‑D reconstruction and stochastic generative models, delivering high‑fidelity, temporally stable reconstructions even when data is limited.

Key Contributions

Geometry‑compatible generative rectification: A closed‑loop system that feeds a diffusion‑based image generator back into an explicit 3‑D representation without breaking spatial consistency.
Degradation‑aware feedback: Introduces an anchor‑based dynamic 3‑D Gaussian Splatting (3DGS) substrate that guides the diffusion model to focus on missing details while respecting the underlying geometry.
Structural locking & spatiotemporal coordinated attention: Novel mechanisms that lock the generated content to the current 3‑D mesh, preventing drift across frames and preserving physical plausibility.
Progressive optimization pipeline: Combines stochastic geometric purification (to remove floaters) with generative distillation (to inject realistic textures) in a multi‑stage refinement loop.
State‑of‑the‑art results: Demonstrates superior reconstruction fidelity, perceptual quality, and temporal consistency on several benchmark dynamic‑scene datasets.

Methodology

Base 3‑DGS Substrate – The system starts with a lightweight, anchor‑based dynamic 3‑D Gaussian Splatting representation built from the sparse multi‑view video. This provides a coarse but geometrically sound scaffold.
Single‑step Diffusion Rectifier – A pretrained diffusion model (trained on large‑scale image data) is invoked to hallucinate missing high‑frequency details. Instead of feeding it raw camera frames, the model receives degraded renderings generated from the current 3‑DGS, which act as a “prompt” that tells the generator what is already known.
Degradation‑aware Feedback Loop – The diffusion output is compared against the degraded input, and the difference is used to update the 3‑DGS anchors. A structural locking module ensures that any new texture or geometry stays aligned with the existing mesh, preventing the “drift” that typically occurs when a stochastic generator is naively applied.
Spatiotemporal Coordinated Attention – Attention maps are computed jointly over space (the 3‑D points) and time (adjacent frames). This lets the rectifier enforce consistency across the video sequence, so a generated detail in frame t will appear in the same physical location in frame t+1.
Progressive Optimization – The pipeline iterates through two phases:
- Geometric purification: Random perturbations are injected and then filtered to eliminate floating points that have no support in the underlying geometry.
- Generative distillation: The refined textures from the diffusion model are distilled back into the 3‑DGS representation, effectively “baking” the high‑quality appearance into the explicit model.

The whole process runs in a closed loop until convergence, yielding a dense, temporally coherent 4‑D reconstruction.

Results & Findings

Quantitative gains: GeoRect4D improves PSNR/SSIM by 15‑20 % over the previous best sparse‑view dynamic reconstruction methods on the DynamicScenes and NeRF‑Dynamic benchmarks.
Perceptual quality: LPIPS scores drop dramatically, indicating that the generated textures look far more realistic to human observers.
Temporal stability: Measured drift (average vertex displacement across consecutive frames) is reduced by more than 50 % compared to baseline diffusion‑augmented pipelines.
Artifact removal: The stochastic purification step eliminates floating specks that plagued earlier approaches, leading to cleaner silhouettes and smoother motion.

Qualitative visualizations show crisp facial details, realistic hair strands, and consistent lighting across time, even when only 3‑4 camera views are available.

Practical Implications

AR/VR content creation: Developers can now generate high‑quality dynamic avatars or environments from a few handheld recordings, cutting down on capture hardware and post‑processing time.
Film & game VFX: Artists can use GeoRect4D to reconstruct stunt or motion‑capture scenes where camera coverage is limited, automatically filling in occluded geometry with plausible details.
Robotics & autonomous systems: Sparse multi‑camera rigs on drones or mobile robots can build reliable 4‑D maps of moving obstacles, improving navigation in dynamic environments.
Telepresence: Real‑time streaming of a person’s 3‑D presence becomes feasible with fewer cameras, as the generative rectifier can hallucinate missing view‑angles on‑the‑fly while keeping the motion stable.

Because the framework works as a plug‑in on top of existing 3‑DGS pipelines, integration into current production tools (e.g., Unity, Unreal, Blender) should be straightforward.

Limitations & Future Work

Computation cost: The diffusion rectifier and iterative purification steps add noticeable runtime overhead, making real‑time deployment still challenging.
Dependency on pre‑trained diffusion models: Quality hinges on the diversity of the image dataset used to train the generator; domain‑specific scenes (e.g., medical imaging) may require fine‑tuning.
Sparse‑view threshold: While the method tolerates very few cameras, performance degrades sharply when the input views drop below three or when motion is extremely fast.
Future directions: The authors suggest exploring lightweight diffusion alternatives, adaptive view‑selection strategies, and tighter integration with neural radiance fields to further boost speed and handle extreme motion.

Authors

Zhenlong Wu
Zihan Zheng
Xuanxuan Wang
Qianhe Wang
Hua Yang
Xiaoyun Zhang
Qiang Hu
Wenjun Zhang

Paper Information

arXiv ID: 2604.20784v1
Categories: cs.CV
Published: April 22, 2026
PDF: Download PDF

[Paper] GeoRect4D: Geometry-Compatible Generative Rectification for Dynamic Sparse-View 3D Reconstruction

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Seeing Fast and Slow: Learning the Flow of Time in Videos

[Paper] Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

[Paper] Context Unrolling in Omni Models

[Paper] Vista4D: Video Reshooting with 4D Point Clouds