[Paper] HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion

Published: 3 days ago (February 11, 2026 at 01:31 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.11117v1

Overview

HairWeaver is a new diffusion‑based system that can take a single still photo of a person and generate a photorealistic video where the subject’s hair moves naturally with the body’s motion. By leveraging a lightweight “LoRA” (Low‑Rank Adaptation) plug‑in that injects motion cues and bridges the gap between synthetic training data and real‑world imagery, the method produces hair dynamics that look far more convincing than previous pose‑only animation pipelines.

Key Contributions

Two‑stage LoRA guidance:
- Motion‑Context‑LoRA injects explicit body‑pose and motion signals into a pretrained video diffusion model.
- Sim2Real‑Domain‑LoRA adapts the model to preserve the subject’s original photorealistic appearance when moving from synthetic (CG) to real‑world image domains.
Few‑shot capability: The system can synthesize realistic hair motion from a single reference image, eliminating the need for multi‑view capture or extensive subject‑specific training.
Specialized CG dataset: A curated collection of high‑fidelity hair simulations generated with a physics‑based hair simulator provides dense motion supervision without costly real‑world motion‑capture rigs.
State‑of‑the‑art results: Quantitative metrics (FID, LPIPS) and user studies show a noticeable jump in realism and motion fidelity over prior video‑diffusion and pose‑animation baselines.

Methodology

Base video diffusion model – The authors start from a publicly available video diffusion backbone (e.g., Stable Diffusion Video) that can generate temporally coherent frames from text or latent prompts.
Motion‑Context‑LoRA – A low‑rank adapter is trained to condition the diffusion model on a sequence of 2‑D pose keypoints (including head orientation). This adapter learns how to steer the latent space so that the generated frames follow the supplied motion trajectory.
Sim2Real‑Domain‑LoRA – Because the diffusion model is originally trained on real‑world video, but the hair motion supervision comes from synthetic CG renders, a second adapter learns to map the CG‑style hair textures and dynamics back into the real‑image domain, preserving the subject’s skin tone, lighting, and background.
Few‑shot inference – At test time, a single portrait is encoded, the two LoRAs are activated, and a user‑provided pose sequence drives the diffusion process, yielding a video where the hair reacts naturally to the motion.

The whole pipeline runs end‑to‑end on a single GPU (≈ 8 GB VRAM) and requires only a few seconds per second of output video, making it practical for interactive prototyping.

Results & Findings

Visual quality: Side‑by‑side comparisons show HairWeaver’s hair retains fine strands, bounce, and secondary motion that other methods flatten or jitter.
Quantitative gains:
- Fréchet Inception Distance (FID) improves by ~30 % over the strongest baseline.
- Learned Perceptual Image Patch Similarity (LPIPS) drops by ~0.07, indicating closer alignment with ground‑truth simulated hair motion.
User study: 85 % of participants rated HairWeaver videos as “more realistic” compared to competing approaches.
Generalization: The model works across diverse hair types (straight, curly, long, short) and lighting conditions, despite being trained primarily on synthetic data.

Practical Implications

Game & VR character animation: Developers can animate NPCs or avatars from a single portrait without building a full rig, saving time and reducing artist workload.
Virtual production & VFX: Quick generation of hair‑aware performance footage enables directors to preview scenes with realistic hair dynamics before committing to costly motion‑capture rigs.
AR/Live‑stream filters: Real‑time hair‑motion effects (e.g., wind, dance moves) can be added to video calls or streaming apps with minimal latency.
E‑commerce & virtual try‑on: Brands can showcase how a hairstyle behaves when a model walks or turns, improving the shopper’s perception of fit and movement.

Limitations & Future Work

Domain shift for extreme lighting: Very high‑contrast or colored lighting still leads to minor artifacts, suggesting the Sim2Real‑Domain‑LoRA could be further refined.
Temporal length: The diffusion backbone currently handles clips up to ~5 seconds without noticeable drift; longer sequences may need hierarchical conditioning.
Hair accessories: The system does not explicitly model accessories (clips, hats) that interact with hair, which can cause unrealistic clipping.
Future directions: The authors plan to incorporate 3‑D pose conditioning, extend the synthetic dataset with more varied physics parameters, and explore real‑time inference optimizations (e.g., distillation to a lightweight transformer).

Authors

Di Chang
Ji Hou
Aljaz Bozic
Assaf Neuberger
Felix Juefei-Xu
Olivier Maury
Gene Wei-Chin Lin
Tuur Stuyck
Doug Roble
Mohammad Soleymani
Stephane Grabli

Paper Information

arXiv ID: 2602.11117v1
Categories: cs.CV
Published: February 11, 2026
PDF: Download PDF

[Paper] HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

[Paper] MonarchRT: Efficient Attention for Real-Time Video Generation

[Paper] Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision