entitygenericai-video-generation
StableAnimator
Notes
StableAnimator
One-line summary: CVPR 2025 pose-driven animation model claimed as the first end-to-end ID-preserving video diffusion framework.
What it is
A reference-image + pose-sequence to video model whose central focus is identity consistency without post-processing face-swap steps. arXiv:2411.17697.
Why it matters to ai-video-generation
Identity drift across long generations is the recurring hard problem in this whole space. StableAnimator is the canonical paper attacking it end-to-end inside the diffusion pipeline. Frequently used as a comparison baseline by later models including wan-animate and persona-avatar-3d. From 2026-05-07-ai-avatar-motion-mimicking-models-survey.
Key facts
- Venue: CVPR 2025.
- Inputs: reference image + sequence of poses.
- Available on GitHub (Francis-Rings/StableAnimator) and HuggingFace.
- First arXiv release: November 2024.
Technical contributions
- Global content-aware Face Encoder: face embeddings are refined by interaction with image embeddings, rather than extracted independently.
- Distribution-aware ID-Adapter: alignment that prevents temporal-layer interference from corrupting face identity, "enabling seamless face embedding integration without video fidelity loss."
- HJB-equation inference-time face optimization: a Hamilton-Jacobi-Bellman optimization runs in parallel with diffusion denoising during inference to enhance face quality, eliminating the need for third-party face-swap tools.
Strengths
- End-to-end ID preservation — no post-hoc face swap needed.
- Handles long sequences and multi-person animation.
- Open source.
Weaknesses
- Authors did not specify which benchmark datasets the abstract evaluation used (per survey).
- Benchmarks against newer DiT-based models (e.g. wan-animate) come up unfavorable on author-reported numbers.
Open questions
Sources
Related
- mimicmotion
- animate-anyone-2
- wan-animate
- heygen-avatar-v — closed-source counterpart with a different identity-preservation strategy (full-token reference attention rather than face-encoder + ID-adapter).
- identity-preservation-video-diffusion
Referenced by