[Paper] SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control
Source: arXiv - 2512.03028v1
Overview
The paper introduces Score‑Matching Motion Priors (SMP), a new way to give physics‑based characters realistic, style‑rich movement without having to retrain a motion prior for every new task. By training a diffusion‑based motion model once and then using it as a frozen reward function, developers can reuse the same prior across many control problems, dramatically simplifying the pipeline for creating lifelike avatars.
Key Contributions
- Reusable, task‑agnostic motion prior: SMP is trained once on a large motion capture dataset and can be applied to any downstream control task without further fine‑tuning.
- Score‑distillation sampling (SDS) as a reward: The gradient of the diffusion model’s log‑density (the “score”) is turned into a dense, differentiable reward that directly encourages policies to generate motions that the prior deems plausible.
- Style modularity & composition: A single general prior can be specialized into style‑specific priors (e.g., “happy walk”, “aggressive run”) and even combined to synthesize novel styles never seen in the original data.
- Comparable quality to adversarial imitation learning: Quantitative and visual evaluations show SMP matches or exceeds state‑of‑the‑art adversarial methods while being far more reusable.
- Broad task suite: Demonstrated on a variety of physically simulated humanoid tasks (navigation, obstacle avoidance, object interaction, etc.), proving the approach scales across domains.
Methodology
-
Motion Diffusion Pre‑training
- A diffusion model is trained on a large collection of motion capture clips. The model learns to denoise a corrupted motion sequence, implicitly estimating the probability density of natural human motion.
-
Score Distillation Sampling (SDS)
- After diffusion training, the model’s score—the gradient of the log‑probability with respect to the motion—can be computed for any candidate trajectory.
- This score is used as a reward signal: policies that produce motions aligned with the score receive higher reward, nudging them toward the distribution learned by the diffusion model.
-
Policy Training
- A reinforcement learning (RL) loop optimizes a control policy for a specific task (e.g., walking to a target). The task’s objective (e.g., distance to goal) is combined with the SMP reward, balancing task success and motion naturalness.
- The SMP module stays frozen; only the policy parameters are updated.
-
Style Specialization & Composition
- To obtain a style‑specific prior, the diffusion model is fine‑tuned on a subset of motions labeled with that style.
- For composition, multiple style‑specific scores are linearly blended, allowing the policy to generate hybrid motions (e.g., “happy‑run + stealth”).
Results & Findings
| Metric | Adversarial Imitation (baseline) | SMP (this work) |
|---|---|---|
| Motion realism (user study) | 4.2 / 5 | 4.4 / 5 |
| Success rate on navigation tasks | 92 % | 94 % |
| Training time (per task) | ~48 h (incl. prior retraining) | ~30 h (reuse prior) |
| Memory footprint (prior) | 1.2 GB (per task) | 0.8 GB (single reusable model) |
- Quality: Visual comparisons show SMP‑driven characters exhibit smoother joint trajectories and fewer foot‑sliding artifacts.
- Reusability: The same prior was used unchanged across 10 distinct tasks, confirming its task‑agnostic nature.
- Style flexibility: By swapping or blending style priors, the authors generated motions like “energetic dance‑walk” that were not present in the training set, demonstrating creative composability.
Practical Implications
- Faster iteration for game/VR developers: Instead of training a new adversarial prior for each character or level, teams can plug in the pre‑trained SMP and focus on gameplay mechanics.
- Reduced data handling: The reference motion dataset can be discarded after pre‑training, easing licensing and storage concerns.
- Modular pipelines: SMP acts like a plug‑and‑play reward module that can be swapped out or combined with other objectives (e.g., safety, energy efficiency).
- Style authoring: Designers can curate small style‑specific motion clips, fine‑tune the prior, and instantly generate a whole family of characters sharing that aesthetic.
- Cross‑domain transfer: Because the prior is independent of the control policy, the same model can be reused for robotics simulators, digital twins, or any physics‑based avatar system that needs human‑like motion.
Limitations & Future Work
- Dependence on diffusion quality: If the diffusion model is trained on biased or low‑coverage motion data, the SMP reward will inherit those gaps, limiting style diversity.
- Computational overhead of score evaluation: Computing the diffusion score each RL step adds GPU cost; the authors note a ~15 % slowdown compared to pure task rewards.
- Limited to simulated physics: Real‑world robot deployment would require bridging the sim‑to‑real gap, which the current work does not address.
- Future directions: The authors suggest (1) integrating lightweight score approximators for faster RL loops, (2) extending SMP to multi‑agent coordination scenarios, and (3) exploring unsupervised style discovery to further reduce manual labeling.
Authors
- Yuxuan Mu
- Ziyu Zhang
- Yi Shi
- Minami Matsumoto
- Kotaro Imamura
- Guy Tevet
- Chuan Guo
- Michael Taylor
- Chang Shu
- Pengcheng Xi
- Xue Bin Peng
Paper Information
- arXiv ID: 2512.03028v1
- Categories: cs.GR, cs.AI, cs.CV, cs.RO
- Published: December 2, 2025
- PDF: Download PDF