[Paper] SketchPlay: Intuitive Creation of Physically Realistic VR Content with Gesture-Driven Sketching
Source: arXiv - 2512.22016v1
Overview
SketchPlay introduces a VR interaction framework that lets users “draw” 3D scenes in mid‑air and instantly turn those sketches into physically realistic simulations. By pairing simple air‑drawn outlines with expressive hand gestures, the system bridges the gap between creative ideation and complex physics‑based content creation, making VR authoring accessible to developers, designers, and educators without deep modeling expertise.
Key Contributions
- Gesture‑driven sketching pipeline that maps 2‑D air sketches to 3‑D object geometry and spatial layout.
- Physical cue encoding via hand gestures (velocity, direction, force) that drive rigid‑body, elastic, and cloth dynamics automatically.
- Unified authoring interface that combines structural (shape) and dynamic (behavior) inputs in a single, intuitive VR workflow.
- Quantitative user study showing higher expressiveness and satisfaction compared with text‑based VR content creation tools.
- Open‑source prototype (Unity + Oculus Quest) released for the community to extend and integrate into existing VR pipelines.
Methodology
- Air Sketch Capture – The system tracks the user’s controller trajectory to generate a polyline that is projected onto a virtual plane, then lifts it into 3‑D space using depth cues (e.g., distance from the user’s head).
- Shape Inference – A lightweight neural network (trained on a synthetic dataset of sketches ↔ 3‑D primitives) predicts the underlying geometry (boxes, cylinders, cloth sheets, etc.) and places them according to the sketch’s topology.
- Gesture Extraction – While sketching, the user performs secondary gestures (swipe, flick, squeeze). The system extracts gesture vectors (direction, speed, pressure) and maps them to physics parameters: initial velocity, impulse magnitude, material stiffness, etc.
- Physics Integration – The inferred objects are instantiated in Unity’s PhysX engine (or Nvidia Flex for soft bodies). The extracted parameters are fed directly into the simulation, causing immediate, realistic motion.
- Iterative Refinement – Users can edit sketches or replay gestures, and the system updates the simulation in real time, enabling a “play‑and‑tweak” loop.
Results & Findings
- Expressiveness: Participants generated 2.8× more distinct physical scenarios (e.g., bouncing balls, waving flags, collapsing structures) than with a baseline text‑command system.
- Creation Speed: Average time to produce a functional scene dropped from 4.2 min (traditional tools) to 1.1 min with SketchPlay.
- User Satisfaction: SUS (System Usability Scale) scores averaged 86/100, indicating high perceived ease of use and enjoyment.
- Physical Accuracy: Simulated dynamics matched ground‑truth physics benchmarks within 5 % error for velocity and deformation, confirming that gesture‑derived parameters are realistic enough for most interactive applications.
Practical Implications
- Rapid Prototyping: Game developers can sketch level elements and instantly test physics interactions without writing code or importing assets, accelerating iteration cycles.
- Education & Training: Teachers can demonstrate physics concepts (projectile motion, elasticity) by letting students draw scenarios that immediately come to life, fostering experiential learning.
- Creative Storytelling: Artists and narrative designers can craft immersive, physics‑driven scenes on the fly, enabling dynamic storytelling where the audience’s gestures shape the environment.
- Cross‑Platform Integration: Because the pipeline relies on standard VR SDKs and open‑source inference models, it can be embedded into existing Unity or Unreal projects, extending the reach of low‑code VR authoring tools.
Limitations & Future Work
- Shape Diversity: Current inference covers a limited set of primitive geometries; complex organic shapes still require manual modeling.
- Gesture Ambiguity: Overlapping gestures (e.g., a fast swipe vs. a flick) can lead to misinterpreted physics parameters, necessitating more robust disambiguation or multimodal cues (voice, haptics).
- Scalability: Real‑time simulation of large numbers of soft bodies strains mobile VR hardware; future work will explore adaptive LOD and GPU‑accelerated solvers.
- User Study Scope: The evaluation involved a relatively small, tech‑savvy cohort; broader studies with novices and domain experts will help validate generalizability.
SketchPlay points toward a future where VR content creation feels as natural as drawing on a whiteboard—opening the door for a wider audience to build physically rich virtual worlds.
Authors
- Xiangwen Zhang
- Xiaowei Dai
- Runnan Chen
- Xiaoming Chen
- Zeke Zexi Hu
Paper Information
- arXiv ID: 2512.22016v1
- Categories: cs.HC, cs.CV
- Published: December 26, 2025
- PDF: Download PDF