PERSONA (3D Avatar Framework)

One-line summary: 2025 hybrid 3D-Gaussian-Splatting + diffusion framework that creates a personalized whole-body 3D avatar from a single image with pose-driven deformations.

What it is

A two-step pipeline:

Use a diffusion model (mimicmotion) to synthesize pose-diverse training videos of the target subject from a single input image.
Optimize a 3D Gaussian Splatting representation with SMPL-X parameters and MLP-predicted pose-driven offsets, baking that synthetic-data understanding into a 3D avatar.

arXiv:2508.09973.

Why it matters to ai-video-generation

PERSONA is the most distinctive 2025 paper bridging the diffusion-based and 3D-parametric branches of avatar work. It uses a diffusion model upstream as a synthetic-data generator, then ends in a riggable 3D avatar — interesting for real-time-avatar-puppeting use cases where per-frame diffusion is too expensive. From 2026-05-07-ai-avatar-motion-mimicking-models-survey.

Key facts

Inputs: a single image.
Output: a riggable 3D Gaussian-Splatting avatar with SMPL-X pose control.
Quantitative baselines: AniGS, LHM, ExAvatar, MimicMotion, Champ, StableAnimator, on NeuMan and X-Humans datasets via PSNR/SSIM/LPIPS.

Technical contributions

Diffusion-bootstrapped training data: MimicMotion generates pose-diverse synthetic videos of the target so the 3D optimization sees enough pose variety.
Identity-anchored shape: SMPL-X shape parameters are extracted from the input image and used to guide synthetic-video generation, preventing identity drift.
Pose-driven Gaussian offsets: per-Gaussian mean offsets predicted by MLPs that input triplane features and 3D poses, modeling cloth/body deformation.
Balanced sampling: oversamples the input image during training to prevent diffusion-induced identity drift.
Albedo + seam-boundary detection: avoids embedding shadows and stitching artifacts into the canonical 3D representation.

Strengths

Combines identity-preservation strength of 3D parametric methods with diffusion's ability to capture pose-dependent cloth dynamics.
Bake-once, animate-cheaply — useful when per-frame diffusion is too expensive.

Weaknesses (per paper)

~1 hour to generate the synthetic training videos.
Cannot model motion-dependent dynamics (no velocity/acceleration).
Struggles with fine pose-dependent cloth wrinkles.
Blurry rendering for complex patterns in occluded regions (a downstream effect of diffusion-video inconsistency).
No relighting capability.

Sources

2026-05-07-ai-avatar-motion-mimicking-models-survey

mimicmotion — used as the synthetic-video generator inside this pipeline.
real-time-avatar-puppeting
identity-preservation-video-diffusion

PERSONA (3D Avatar Framework)

PERSONA (3D Avatar Framework)

What it is

Why it matters to ai-video-generation

Key facts

Technical contributions

Strengths

Weaknesses (per paper)

Sources

Related