[Paper] Splannequin: Freezing Monocular Mannequin-Challenge Footage with Dual-Detection Splatting

Published: (December 4, 2025 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.05113v1

Overview

The paper introduces Splannequin, a technique for turning everyday “Mannequin‑Challenge” videos—those single‑camera clips where people stay perfectly still while the camera moves—into high‑quality frozen 3D scenes that can be explored from any angle. By leveraging dynamic Gaussian splatting and a clever regularization strategy, the authors achieve photorealistic, artifact‑free reconstructions that let developers embed interactive, instant‑selection “freeze‑frames” into AR/VR experiences.

Key Contributions

  • Dynamic‑to‑Static Gaussian Splatting: Re‑uses a dynamic scene model but freezes it at a chosen timestamp, preserving subtle background motion while keeping the foreground static.
  • State‑aware Regularization (Hidden & Defective Gaussians): Detects poorly observed or occluded Gaussian primitives and anchors them temporally to more reliable observations, eliminating ghosting and blur.
  • Architecture‑agnostic Plug‑in: Implements the regularization as a few loss terms that can be dropped into any existing dynamic Gaussian splatting pipeline without code changes or extra inference cost.
  • User‑controlled Frozen‑time Rendering: Enables instant selection of any frame as a static viewpoint, opening up interactive storytelling and content creation workflows.
  • Extensive Human Evaluation: 96 % of participants preferred the Splannequin output over baseline methods, confirming perceptual quality gains.

Methodology

  1. Dynamic Gaussian Representation – The scene is modeled as a cloud of 3D Gaussian primitives whose positions, colors, and opacities evolve over time (the standard approach for dynamic splatting).
  2. Temporal Anchoring
    • Hidden Gaussians: primitives that become invisible due to occlusion or view‑angle changes are “anchored” to their most recent well‑observed state, preventing them from drifting into ghostly artifacts.
    • Defective Gaussians: primitives that receive weak supervision (e.g., only a few frames) are anchored forward to future frames where they are better observed.
  3. Loss Formulation – Two additional regularization terms are added to the training objective: one penalizing deviation from the anchored past state for hidden Gaussians, and another encouraging alignment with the future state for defective Gaussians.
  4. Freezing the Model – At inference time the time parameter is fixed to a user‑chosen timestamp, rendering a static scene while still benefiting from the temporally smoothed Gaussian parameters learned during training.
  5. Integration – The method plugs into any dynamic Gaussian pipeline (e.g., D‑NeRF, HyperNeRF) without altering the network architecture or the rendering pipeline.

Results & Findings

  • Visual Quality: Compared to baseline dynamic splatting without anchoring, Splannequin eliminates ghosting, reduces blur, and restores fine texture details in frozen renders.
  • Quantitative Metrics: PSNR/SSIM improvements of ~1.2 dB / 0.03 on standard MC video benchmarks.
  • Human Preference: In a blind user study, 96 % of participants rated Splannequin outputs as more realistic and visually pleasing than the next‑best method.
  • Zero Runtime Overhead: Because the regularization only affects training, inference speed matches that of the underlying dynamic splatting model.

Practical Implications

  • AR/VR Content Creation: Developers can turn a single handheld video into a fully navigable 3D environment with a “pause‑and‑look‑around” mode, ideal for virtual tours, gaming cut‑scenes, or immersive storytelling.
  • Live Broadcast Enhancements: Sports or event producers could capture a single camera sweep and instantly generate frozen‑time replays that viewers can explore from any angle.
  • Rapid Prototyping: No need for multi‑camera rigs or depth sensors; a smartphone video suffices, dramatically lowering the barrier for small studios and indie creators.
  • Integration Path: Existing pipelines that already use dynamic Gaussian splatting (e.g., for neural avatars) can adopt Splannequin with a few lines of loss‑definition code, reaping quality gains without extra hardware.

Limitations & Future Work

  • Forward‑motion Assumption: The anchoring strategy relies on predominantly forward camera motion; rapid backward or highly erratic trajectories may still produce artifacts.
  • Sparse Supervision: Extremely short clips (fewer than ~30 frames) limit the ability to identify defective Gaussians reliably.
  • Generalization to Highly Dynamic Scenes: The method is tailored to “freeze‑frame” scenarios; applying it to scenes with genuine motion (e.g., dancing crowds) would require additional handling.
  • Future Directions: Extending the anchoring mechanism to bidirectional motion, incorporating learned occlusion masks, and exploring hybrid depth‑sensor fusion to further boost reconstruction fidelity.

Authors

  • Hao-Jen Chien
  • Yi-Chuan Huang
  • Chung-Ho Wu
  • Wei-Lun Chao
  • Yu-Lun Liu

Paper Information

  • arXiv ID: 2512.05113v1
  • Categories: cs.CV
  • Published: December 4, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »