StableAnimator

One-line summary: CVPR 2025 pose-driven animation model claimed as the first end-to-end ID-preserving video diffusion framework.

What it is

A reference-image + pose-sequence to video model whose central focus is identity consistency without post-processing face-swap steps. arXiv:2411.17697.

Why it matters to ai-video-generation

Identity drift across long generations is the recurring hard problem in this whole space. StableAnimator is the canonical paper attacking it end-to-end inside the diffusion pipeline. Frequently used as a comparison baseline by later models including wan-animate and persona-avatar-3d. From 2026-05-07-ai-avatar-motion-mimicking-models-survey.

Key facts

Venue: CVPR 2025.
Inputs: reference image + sequence of poses.
Available on GitHub (Francis-Rings/StableAnimator) and HuggingFace.
First arXiv release: November 2024.

Technical contributions

Global content-aware Face Encoder: face embeddings are refined by interaction with image embeddings, rather than extracted independently.
Distribution-aware ID-Adapter: alignment that prevents temporal-layer interference from corrupting face identity, "enabling seamless face embedding integration without video fidelity loss."
HJB-equation inference-time face optimization: a Hamilton-Jacobi-Bellman optimization runs in parallel with diffusion denoising during inference to enhance face quality, eliminating the need for third-party face-swap tools.

Strengths

End-to-end ID preservation — no post-hoc face swap needed.
Handles long sequences and multi-person animation.
Open source.

Weaknesses

Authors did not specify which benchmark datasets the abstract evaluation used (per survey).
Benchmarks against newer DiT-based models (e.g. wan-animate) come up unfavorable on author-reported numbers.

Open questions

See independent-avatar-benchmarks.

Sources

2026-05-07-ai-avatar-motion-mimicking-models-survey

mimicmotion
animate-anyone-2
wan-animate
heygen-avatar-v — closed-source counterpart with a different identity-preservation strategy (full-token reference attention rather than face-encoder + ID-adapter).
identity-preservation-video-diffusion

StableAnimator

StableAnimator

What it is

Why it matters to ai-video-generation

Key facts

Technical contributions

Strengths

Weaknesses

Open questions

Sources

Related