[Paper] SketchPlay: Intuitive Creation of Physically Realistic VR Content with Gesture-Driven Sketching

Published: 1 month ago (December 26, 2025 at 07:32 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.22016v1

Overview

SketchPlay introduces a VR interaction framework that lets users “draw” 3D scenes in mid‑air and instantly turn those sketches into physically realistic simulations. By pairing simple air‑drawn outlines with expressive hand gestures, the system bridges the gap between creative ideation and complex physics‑based content creation, making VR authoring accessible to developers, designers, and educators without deep modeling expertise.

Key Contributions

Gesture‑driven sketching pipeline that maps 2‑D air sketches to 3‑D object geometry and spatial layout.
Physical cue encoding via hand gestures (velocity, direction, force) that drive rigid‑body, elastic, and cloth dynamics automatically.
Unified authoring interface that combines structural (shape) and dynamic (behavior) inputs in a single, intuitive VR workflow.
Quantitative user study showing higher expressiveness and satisfaction compared with text‑based VR content creation tools.
Open‑source prototype (Unity + Oculus Quest) released for the community to extend and integrate into existing VR pipelines.

Methodology

Air Sketch Capture – The system tracks the user’s controller trajectory to generate a polyline that is projected onto a virtual plane, then lifts it into 3‑D space using depth cues (e.g., distance from the user’s head).
Shape Inference – A lightweight neural network (trained on a synthetic dataset of sketches ↔ 3‑D primitives) predicts the underlying geometry (boxes, cylinders, cloth sheets, etc.) and places them according to the sketch’s topology.
Gesture Extraction – While sketching, the user performs secondary gestures (swipe, flick, squeeze). The system extracts gesture vectors (direction, speed, pressure) and maps them to physics parameters: initial velocity, impulse magnitude, material stiffness, etc.
Physics Integration – The inferred objects are instantiated in Unity’s PhysX engine (or Nvidia Flex for soft bodies). The extracted parameters are fed directly into the simulation, causing immediate, realistic motion.
Iterative Refinement – Users can edit sketches or replay gestures, and the system updates the simulation in real time, enabling a “play‑and‑tweak” loop.

Results & Findings

Expressiveness: Participants generated 2.8× more distinct physical scenarios (e.g., bouncing balls, waving flags, collapsing structures) than with a baseline text‑command system.
Creation Speed: Average time to produce a functional scene dropped from 4.2 min (traditional tools) to 1.1 min with SketchPlay.
User Satisfaction: SUS (System Usability Scale) scores averaged 86/100, indicating high perceived ease of use and enjoyment.
Physical Accuracy: Simulated dynamics matched ground‑truth physics benchmarks within 5 % error for velocity and deformation, confirming that gesture‑derived parameters are realistic enough for most interactive applications.

Practical Implications

Rapid Prototyping: Game developers can sketch level elements and instantly test physics interactions without writing code or importing assets, accelerating iteration cycles.
Education & Training: Teachers can demonstrate physics concepts (projectile motion, elasticity) by letting students draw scenarios that immediately come to life, fostering experiential learning.
Creative Storytelling: Artists and narrative designers can craft immersive, physics‑driven scenes on the fly, enabling dynamic storytelling where the audience’s gestures shape the environment.
Cross‑Platform Integration: Because the pipeline relies on standard VR SDKs and open‑source inference models, it can be embedded into existing Unity or Unreal projects, extending the reach of low‑code VR authoring tools.

Limitations & Future Work

Shape Diversity: Current inference covers a limited set of primitive geometries; complex organic shapes still require manual modeling.
Gesture Ambiguity: Overlapping gestures (e.g., a fast swipe vs. a flick) can lead to misinterpreted physics parameters, necessitating more robust disambiguation or multimodal cues (voice, haptics).
Scalability: Real‑time simulation of large numbers of soft bodies strains mobile VR hardware; future work will explore adaptive LOD and GPU‑accelerated solvers.
User Study Scope: The evaluation involved a relatively small, tech‑savvy cohort; broader studies with novices and domain experts will help validate generalizability.

SketchPlay points toward a future where VR content creation feels as natural as drawing on a whiteboard—opening the door for a wider audience to build physically rich virtual worlds.

Authors

Xiangwen Zhang
Xiaowei Dai
Runnan Chen
Xiaoming Chen
Zeke Zexi Hu

Paper Information

arXiv ID: 2512.22016v1
Categories: cs.HC, cs.CV
Published: December 26, 2025
PDF: Download PDF

[Paper] SketchPlay: Intuitive Creation of Physically Realistic VR Content with Gesture-Driven Sketching

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

[Paper] ProEdit: Inversion-based Editing From Prompts Done Right

[Paper] Learning Association via Track-Detection Matching for Multi-Object Tracking

[Paper] Yume-1.5: A Text-Controlled Interactive World Generation Model