brain/
← all entities
entitygenericai-video-generation

StableAnimator

Notes

StableAnimator

One-line summary: CVPR 2025 pose-driven animation model claimed as the first end-to-end ID-preserving video diffusion framework.

What it is

A reference-image + pose-sequence to video model whose central focus is identity consistency without post-processing face-swap steps. arXiv:2411.17697.

Why it matters to ai-video-generation

Identity drift across long generations is the recurring hard problem in this whole space. StableAnimator is the canonical paper attacking it end-to-end inside the diffusion pipeline. Frequently used as a comparison baseline by later models including wan-animate and persona-avatar-3d. From 2026-05-07-ai-avatar-motion-mimicking-models-survey.

Key facts

  • Venue: CVPR 2025.
  • Inputs: reference image + sequence of poses.
  • Available on GitHub (Francis-Rings/StableAnimator) and HuggingFace.
  • First arXiv release: November 2024.

Technical contributions

  • Global content-aware Face Encoder: face embeddings are refined by interaction with image embeddings, rather than extracted independently.
  • Distribution-aware ID-Adapter: alignment that prevents temporal-layer interference from corrupting face identity, "enabling seamless face embedding integration without video fidelity loss."
  • HJB-equation inference-time face optimization: a Hamilton-Jacobi-Bellman optimization runs in parallel with diffusion denoising during inference to enhance face quality, eliminating the need for third-party face-swap tools.

Strengths

  • End-to-end ID preservation — no post-hoc face swap needed.
  • Handles long sequences and multi-person animation.
  • Open source.

Weaknesses

  • Authors did not specify which benchmark datasets the abstract evaluation used (per survey).
  • Benchmarks against newer DiT-based models (e.g. wan-animate) come up unfavorable on author-reported numbers.

Open questions

Sources

Related

Referenced by