[Paper] SOMA: Unifying Parametric Human Body Models

Published: (March 17, 2026 at 01:58 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2603.16858v1

Overview

The paper introduces SOMA, a “unified body layer” that lets developers work with any of the major parametric human body models (SMPL, SMPL‑X, MHR, Anny, etc.) without having to write custom adapters for each pair. By abstracting mesh topology, skeletons, and pose representations, SOMA turns a combinatorial nightmare into a single, plug‑and‑play component that runs in real time on the GPU.

Key Contributions

  • Three‑level abstraction (mesh, skeletal, pose) that maps every supported model to a common canonical representation.
  • Constant‑time mesh topology conversion per vertex, eliminating the need for per‑model lookup tables.
  • Closed‑form skeletal recovery that produces identity‑adapted joint transforms from any shape or pose in a single pass—no iterative optimization or model‑specific training.
  • Pose inversion that extracts unified skeleton rotations directly from posed vertices, enabling seamless mixing of motion capture datasets across models.
  • Scalable connectivity: reduces the adapter complexity from (O(M^2)) (pairwise converters) to (O(M)) (one connector per model).
  • Fully differentiable, GPU‑accelerated implementation built on NVIDIA‑Warp, making SOMA ready for deep‑learning pipelines.

Methodology

SOMA treats each parametric model as a view of a shared underlying human body:

  1. Mesh Topology Abstraction – A canonical mesh (the “SOMA mesh”) is defined once. For any source model, a pre‑computed per‑vertex mapping tells SOMA how to copy or blend vertex attributes, achieving a constant‑time conversion.
  2. Skeletal Abstraction – The canonical skeleton is expressed as a set of joint transforms that are identity‑aware. Using the model’s shape parameters, SOMA solves a linear system that yields the full joint hierarchy in a single closed‑form step, regardless of whether the input is in a rest pose or already posed.
  3. Pose Abstraction – By inverting the standard linear blend skinning (LBS) pipeline, SOMA recovers the rotation matrices that produced the given vertex positions. This works for any supported model because the skinning weights are also mapped to the canonical space.

All three layers are chained together, producing a differentiable function

[ \text{SOMA}( \text{model_id}, \text{shape}, \text{pose}) \rightarrow \text{canonical_mesh}, \text{canonical_joints} ]

that can be inserted into training loops or inference pipelines without extra bookkeeping.

Results & Findings

  • Speed – Mesh conversion runs at ~0.5 µs per vertex; skeletal recovery and pose inversion each complete in <1 ms for a full body (≈10 k vertices) on an RTX 3080.
  • Accuracy – When converting SMPL‑X to SMPL via SOMA, the resulting mesh deviates by <0.3 mm (average Euclidean distance) from a ground‑truth direct conversion, confirming that the abstraction does not sacrifice geometric fidelity.
  • Scalability – Adding a new model only requires a one‑time topology/weight mapping; the overall system size grows linearly with the number of models, not quadratically.
  • Differentiability – End‑to‑end training of a pose‑estimation network that outputs SOMA‑compatible parameters converges 1.8× faster than using separate per‑model adapters, thanks to the smooth gradients through the abstraction layers.

Practical Implications

  • Cross‑dataset training – Researchers can now train a single model on motion capture data from SMPL‑X, MHR, and Anny without manual retargeting, dramatically expanding usable data.
  • Hybrid pipelines – Game studios can combine high‑fidelity facial rigs from SMPL‑X with lightweight body rigs from SMPL in the same character, swapping the best parts of each model on the fly.
  • Real‑time applications – Because the whole pipeline is GPU‑accelerated and differentiable, it can be embedded in AR/VR avatars, live‑streaming filters, or robotics simulators that need fast, consistent body representations.
  • Simplified tooling – SDKs and libraries only need to ship one “SOMA backend” instead of dozens of pairwise converters, reducing maintenance overhead and version‑compatibility headaches.

Limitations & Future Work

  • Model coverage – SOMA currently supports the most popular parametric models; exotic or proprietary rigs would still need a custom topology/weight map before they can be used.
  • Skinning assumptions – The pose inversion relies on linear blend skinning; models that use more complex deformation (e.g., corrective blend shapes) may lose some nuance.
  • Fine‑grained detail – While geometric error is low, subtle high‑frequency details (e.g., garment wrinkles) are not explicitly preserved across models.
  • Future directions – The authors plan to extend SOMA to handle non‑rigid accessories, integrate neural implicit representations for finer detail, and open‑source a conversion toolkit to accelerate community adoption.

Authors

  • Jun Saito
  • Jiefeng Li
  • Michael de Ruyter
  • Miguel Guerrero
  • Edy Lim
  • Ehsan Hassani
  • Roger Blanco Ribera
  • Hyejin Moon
  • Magdalena Dadela
  • Marco Di Lucca
  • Qiao Wang
  • Xueting Li
  • Jan Kautz
  • Simon Yuen
  • Umar Iqbal

Paper Information

  • arXiv ID: 2603.16858v1
  • Categories: cs.CV, cs.AI
  • Published: March 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »