[Paper] HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion
Source: arXiv - 2602.11117v1
Overview
HairWeaver is a new diffusion‑based system that can take a single still photo of a person and generate a photorealistic video where the subject’s hair moves naturally with the body’s motion. By leveraging a lightweight “LoRA” (Low‑Rank Adaptation) plug‑in that injects motion cues and bridges the gap between synthetic training data and real‑world imagery, the method produces hair dynamics that look far more convincing than previous pose‑only animation pipelines.
Key Contributions
- Two‑stage LoRA guidance:
- Motion‑Context‑LoRA injects explicit body‑pose and motion signals into a pretrained video diffusion model.
- Sim2Real‑Domain‑LoRA adapts the model to preserve the subject’s original photorealistic appearance when moving from synthetic (CG) to real‑world image domains.
- Few‑shot capability: The system can synthesize realistic hair motion from a single reference image, eliminating the need for multi‑view capture or extensive subject‑specific training.
- Specialized CG dataset: A curated collection of high‑fidelity hair simulations generated with a physics‑based hair simulator provides dense motion supervision without costly real‑world motion‑capture rigs.
- State‑of‑the‑art results: Quantitative metrics (FID, LPIPS) and user studies show a noticeable jump in realism and motion fidelity over prior video‑diffusion and pose‑animation baselines.
Methodology
- Base video diffusion model – The authors start from a publicly available video diffusion backbone (e.g., Stable Diffusion Video) that can generate temporally coherent frames from text or latent prompts.
- Motion‑Context‑LoRA – A low‑rank adapter is trained to condition the diffusion model on a sequence of 2‑D pose keypoints (including head orientation). This adapter learns how to steer the latent space so that the generated frames follow the supplied motion trajectory.
- Sim2Real‑Domain‑LoRA – Because the diffusion model is originally trained on real‑world video, but the hair motion supervision comes from synthetic CG renders, a second adapter learns to map the CG‑style hair textures and dynamics back into the real‑image domain, preserving the subject’s skin tone, lighting, and background.
- Few‑shot inference – At test time, a single portrait is encoded, the two LoRAs are activated, and a user‑provided pose sequence drives the diffusion process, yielding a video where the hair reacts naturally to the motion.
The whole pipeline runs end‑to‑end on a single GPU (≈ 8 GB VRAM) and requires only a few seconds per second of output video, making it practical for interactive prototyping.
Results & Findings
- Visual quality: Side‑by‑side comparisons show HairWeaver’s hair retains fine strands, bounce, and secondary motion that other methods flatten or jitter.
- Quantitative gains:
- Fréchet Inception Distance (FID) improves by ~30 % over the strongest baseline.
- Learned Perceptual Image Patch Similarity (LPIPS) drops by ~0.07, indicating closer alignment with ground‑truth simulated hair motion.
- User study: 85 % of participants rated HairWeaver videos as “more realistic” compared to competing approaches.
- Generalization: The model works across diverse hair types (straight, curly, long, short) and lighting conditions, despite being trained primarily on synthetic data.
Practical Implications
- Game & VR character animation: Developers can animate NPCs or avatars from a single portrait without building a full rig, saving time and reducing artist workload.
- Virtual production & VFX: Quick generation of hair‑aware performance footage enables directors to preview scenes with realistic hair dynamics before committing to costly motion‑capture rigs.
- AR/Live‑stream filters: Real‑time hair‑motion effects (e.g., wind, dance moves) can be added to video calls or streaming apps with minimal latency.
- E‑commerce & virtual try‑on: Brands can showcase how a hairstyle behaves when a model walks or turns, improving the shopper’s perception of fit and movement.
Limitations & Future Work
- Domain shift for extreme lighting: Very high‑contrast or colored lighting still leads to minor artifacts, suggesting the Sim2Real‑Domain‑LoRA could be further refined.
- Temporal length: The diffusion backbone currently handles clips up to ~5 seconds without noticeable drift; longer sequences may need hierarchical conditioning.
- Hair accessories: The system does not explicitly model accessories (clips, hats) that interact with hair, which can cause unrealistic clipping.
- Future directions: The authors plan to incorporate 3‑D pose conditioning, extend the synthetic dataset with more varied physics parameters, and explore real‑time inference optimizations (e.g., distillation to a lightweight transformer).
Authors
- Di Chang
- Ji Hou
- Aljaz Bozic
- Assaf Neuberger
- Felix Juefei-Xu
- Olivier Maury
- Gene Wei-Chin Lin
- Tuur Stuyck
- Doug Roble
- Mohammad Soleymani
- Stephane Grabli
Paper Information
- arXiv ID: 2602.11117v1
- Categories: cs.CV
- Published: February 11, 2026
- PDF: Download PDF