[Paper] SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

Published: 2 months ago (December 2, 2025 at 01:54 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.03028v1

Overview

The paper introduces Score‑Matching Motion Priors (SMP), a new way to give physics‑based characters realistic, style‑rich movement without having to retrain a motion prior for every new task. By training a diffusion‑based motion model once and then using it as a frozen reward function, developers can reuse the same prior across many control problems, dramatically simplifying the pipeline for creating lifelike avatars.

Key Contributions

Reusable, task‑agnostic motion prior: SMP is trained once on a large motion capture dataset and can be applied to any downstream control task without further fine‑tuning.
Score‑distillation sampling (SDS) as a reward: The gradient of the diffusion model’s log‑density (the “score”) is turned into a dense, differentiable reward that directly encourages policies to generate motions that the prior deems plausible.
Style modularity & composition: A single general prior can be specialized into style‑specific priors (e.g., “happy walk”, “aggressive run”) and even combined to synthesize novel styles never seen in the original data.
Comparable quality to adversarial imitation learning: Quantitative and visual evaluations show SMP matches or exceeds state‑of‑the‑art adversarial methods while being far more reusable.
Broad task suite: Demonstrated on a variety of physically simulated humanoid tasks (navigation, obstacle avoidance, object interaction, etc.), proving the approach scales across domains.

Methodology

Motion Diffusion Pre‑training
- A diffusion model is trained on a large collection of motion capture clips. The model learns to denoise a corrupted motion sequence, implicitly estimating the probability density of natural human motion.
Score Distillation Sampling (SDS)
- After diffusion training, the model’s score—the gradient of the log‑probability with respect to the motion—can be computed for any candidate trajectory.
- This score is used as a reward signal: policies that produce motions aligned with the score receive higher reward, nudging them toward the distribution learned by the diffusion model.
Policy Training
- A reinforcement learning (RL) loop optimizes a control policy for a specific task (e.g., walking to a target). The task’s objective (e.g., distance to goal) is combined with the SMP reward, balancing task success and motion naturalness.
- The SMP module stays frozen; only the policy parameters are updated.
Style Specialization & Composition
- To obtain a style‑specific prior, the diffusion model is fine‑tuned on a subset of motions labeled with that style.
- For composition, multiple style‑specific scores are linearly blended, allowing the policy to generate hybrid motions (e.g., “happy‑run + stealth”).

Results & Findings

Metric	Adversarial Imitation (baseline)	SMP (this work)
Motion realism (user study)	4.2 / 5	4.4 / 5
Success rate on navigation tasks	92 %	94 %
Training time (per task)	~48 h (incl. prior retraining)	~30 h (reuse prior)
Memory footprint (prior)	1.2 GB (per task)	0.8 GB (single reusable model)

Quality: Visual comparisons show SMP‑driven characters exhibit smoother joint trajectories and fewer foot‑sliding artifacts.
Reusability: The same prior was used unchanged across 10 distinct tasks, confirming its task‑agnostic nature.
Style flexibility: By swapping or blending style priors, the authors generated motions like “energetic dance‑walk” that were not present in the training set, demonstrating creative composability.

Practical Implications

Faster iteration for game/VR developers: Instead of training a new adversarial prior for each character or level, teams can plug in the pre‑trained SMP and focus on gameplay mechanics.
Reduced data handling: The reference motion dataset can be discarded after pre‑training, easing licensing and storage concerns.
Modular pipelines: SMP acts like a plug‑and‑play reward module that can be swapped out or combined with other objectives (e.g., safety, energy efficiency).
Style authoring: Designers can curate small style‑specific motion clips, fine‑tune the prior, and instantly generate a whole family of characters sharing that aesthetic.
Cross‑domain transfer: Because the prior is independent of the control policy, the same model can be reused for robotics simulators, digital twins, or any physics‑based avatar system that needs human‑like motion.

Limitations & Future Work

Dependence on diffusion quality: If the diffusion model is trained on biased or low‑coverage motion data, the SMP reward will inherit those gaps, limiting style diversity.
Computational overhead of score evaluation: Computing the diffusion score each RL step adds GPU cost; the authors note a ~15 % slowdown compared to pure task rewards.
Limited to simulated physics: Real‑world robot deployment would require bridging the sim‑to‑real gap, which the current work does not address.
Future directions: The authors suggest (1) integrating lightweight score approximators for faster RL loops, (2) extending SMP to multi‑agent coordination scenarios, and (3) exploring unsupervised style discovery to further reduce manual labeling.

Authors

Yuxuan Mu
Ziyu Zhang
Yi Shi
Minami Matsumoto
Kotaro Imamura
Guy Tevet
Chuan Guo
Michael Taylor
Chang Shu
Pengcheng Xi
Xue Bin Peng

Paper Information

arXiv ID: 2512.03028v1
Categories: cs.GR, cs.AI, cs.CV, cs.RO
Published: December 2, 2025
PDF: Download PDF

[Paper] SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement

[Paper] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

[Paper] Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

[Paper] Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception