entitygenericai-video-generation
PERSONA (3D Avatar Framework)
Notes
PERSONA (3D Avatar Framework)
One-line summary: 2025 hybrid 3D-Gaussian-Splatting + diffusion framework that creates a personalized whole-body 3D avatar from a single image with pose-driven deformations.
What it is
A two-step pipeline:
- Use a diffusion model (mimicmotion) to synthesize pose-diverse training videos of the target subject from a single input image.
- Optimize a 3D Gaussian Splatting representation with SMPL-X parameters and MLP-predicted pose-driven offsets, baking that synthetic-data understanding into a 3D avatar.
arXiv:2508.09973.
Why it matters to ai-video-generation
PERSONA is the most distinctive 2025 paper bridging the diffusion-based and 3D-parametric branches of avatar work. It uses a diffusion model upstream as a synthetic-data generator, then ends in a riggable 3D avatar — interesting for real-time-avatar-puppeting use cases where per-frame diffusion is too expensive. From 2026-05-07-ai-avatar-motion-mimicking-models-survey.
Key facts
- Inputs: a single image.
- Output: a riggable 3D Gaussian-Splatting avatar with SMPL-X pose control.
- Quantitative baselines: AniGS, LHM, ExAvatar, MimicMotion, Champ, StableAnimator, on NeuMan and X-Humans datasets via PSNR/SSIM/LPIPS.
Technical contributions
- Diffusion-bootstrapped training data: MimicMotion generates pose-diverse synthetic videos of the target so the 3D optimization sees enough pose variety.
- Identity-anchored shape: SMPL-X shape parameters are extracted from the input image and used to guide synthetic-video generation, preventing identity drift.
- Pose-driven Gaussian offsets: per-Gaussian mean offsets predicted by MLPs that input triplane features and 3D poses, modeling cloth/body deformation.
- Balanced sampling: oversamples the input image during training to prevent diffusion-induced identity drift.
- Albedo + seam-boundary detection: avoids embedding shadows and stitching artifacts into the canonical 3D representation.
Strengths
- Combines identity-preservation strength of 3D parametric methods with diffusion's ability to capture pose-dependent cloth dynamics.
- Bake-once, animate-cheaply — useful when per-frame diffusion is too expensive.
Weaknesses (per paper)
- ~1 hour to generate the synthetic training videos.
- Cannot model motion-dependent dynamics (no velocity/acceleration).
- Struggles with fine pose-dependent cloth wrinkles.
- Blurry rendering for complex patterns in occluded regions (a downstream effect of diffusion-video inconsistency).
- No relighting capability.
Sources
Related
- mimicmotion — used as the synthetic-video generator inside this pipeline.
- real-time-avatar-puppeting
- identity-preservation-video-diffusion
Referenced by