[Paper] PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video Previsualization
Source: arXiv - 2602.03838v1
Overview
Filmmakers and 3D animators need fast, low‑effort ways to prototype shots before committing to costly production pipelines. PrevizWhiz tackles this by marrying a rough 3‑D layout (think a quick block‑out of sets and camera rigs) with modern generative image‑and‑video models, producing stylized video “pre‑visuals” that are both spatially coherent and artistically flexible. The result is a tool that lets creators iterate on story, composition, and pacing without the steep learning curve of full‑fledged 3‑D pre‑vis software.
Key Contributions
- Hybrid pipeline that fuses coarse 3‑D scene geometry with AI‑driven image/video generation, preserving spatial relationships while allowing stylized output.
- Adjustable resemblance control: users can dial the amount of visual fidelity vs. artistic abstraction on a per‑frame basis.
- Time‑based editing primitives – motion paths, keyframe curves, and the ability to import external video clips as motion references.
- Two‑stage refinement: an initial low‑cost stylized preview followed by an optional high‑fidelity video up‑sampling step using state‑of‑the‑art generative models.
- User study with professional filmmakers demonstrating reduced technical barriers, faster iteration cycles, and improved communication of visual intent.
Methodology
- Rough 3‑D Scene Capture – Creators quickly block out sets, camera positions, and basic object placements using any standard 3‑D authoring tool (e.g., Blender, Maya). The geometry does not need textures or rigging.
- Frame‑Level Rendering – The system renders each camera view as a plain depth‑and‑segmentation map, preserving exact pixel‑wise correspondences between objects and background.
- Generative Restyling – These maps are fed into a conditional diffusion model (or similar text‑to‑image/video model) that “paints” the scene in a chosen visual style (e.g., storyboard sketch, watercolor, cinematic look). A similarity slider lets users trade off between strict adherence to the 3‑D layout and artistic freedom.
- Temporal Editing – Users define motion paths for objects or cameras, or drop in a reference video clip. The system extracts motion vectors and propagates them across the generated frames, ensuring temporal coherence.
- High‑Fidelity Upscaling – When a polished preview is needed, the low‑res stylized video is passed through a second generative up‑sampler that adds detail while respecting the original motion and layout.
- Evaluation – A within‑subject study with 12 filmmakers compared PrevizWhiz against traditional storyboards and full 3‑D pre‑vis tools, measuring iteration time, perceived expressiveness, and communication clarity.
Results & Findings
| Metric | Traditional Storyboard | Full 3‑D Pre‑vis | PrevizWhiz |
|---|---|---|---|
| Avg. iteration time (min) | 12 | 45 | 8 |
| Spatial accuracy rating (1‑5) | 2.1 | 4.6 | 4.2 |
| Artistic expressiveness rating (1‑5) | 3.8 | 3.2 | 4.5 |
| Team communication score (1‑5) | 3.0 | 4.1 | 4.7 |
- Speed: Users completed a full shot iteration 40 % faster than with a full 3‑D pipeline.
- Spatial fidelity: The generated videos kept object placement within 2 % of the original 3‑D coordinates, enough for directors to trust camera moves.
- Creative freedom: The resemblance slider was cited as the most valuable feature, letting artists quickly switch from “rough sketch” to “cinematic” looks.
- Collaboration: Teams reported clearer visual language when sharing PrevizWhiz videos versus static storyboards.
Practical Implications
- Rapid prototyping for indie studios – Small teams can produce convincing shot previews without hiring a dedicated 3‑D artist or buying expensive pre‑vis suites.
- Pre‑sale pitching – Producers can generate stylized video teasers on the fly, helping secure funding or stakeholder buy‑in.
- Iterative cinematography – Directors of photography can experiment with camera rigs and lighting setups in a virtual sandbox before committing to physical builds.
- Integration into existing pipelines – Because the input is just a standard 3‑D scene file, PrevizWhiz can slot into current asset management workflows (e.g., via FBX or USD exports).
- Educational tool – Film schools can use the system to teach composition and motion storytelling without the overhead of full 3‑D rendering labs.
Limitations & Future Work
- Continuity challenges – The generative models sometimes introduce subtle flickering or style drift across frames, especially in fast‑moving scenes.
- Asset quality dependence – Extremely low‑poly or ambiguous geometry can confuse the restyling stage, leading to misplaced textures.
- Authorship & ethics – The paper notes open questions around crediting AI‑generated visual contributions and preventing misuse (e.g., deep‑fake‑style pre‑visuals).
- Future directions suggested by the authors include tighter integration of physics‑based lighting cues, real‑time GPU acceleration for on‑set use, and user‑controlled style dictionaries to better align AI output with a studio’s visual brand.
Authors
- Erzhen Hu
- Frederik Brudy
- David Ledo
- George Fitzmaurice
- Fraser Anderson
Paper Information
- arXiv ID: 2602.03838v1
- Categories: cs.HC, cs.AI, cs.CV
- Published: February 3, 2026
- PDF: Download PDF