UniAnimate

One-line summary: Alibaba ali-vilab unified video diffusion model for consistent human image animation, with a 2025 DiT-based variant (UniAnimate-DiT) on the Wan 2.1 backbone.

What it is

Two-generation lineage:

UniAnimate (SCIS 2025) — a unified-noise-input video diffusion architecture for pose-driven human animation. Replaces compute-heavy temporal Transformers with state-space-model temporal architecture.
UniAnimate-DiT (April 2025) — successor built on the Wan 2.1-14B I2V Diffusion Transformer foundation, fine-tuned via LoRA.

Why it matters to ai-video-generation

UniAnimate's first-frame-conditioned iterative strategy lets it generate one-minute consistent videos — long generation has historically been a weak spot for this class of model. UniAnimate-DiT is one of the cleanest examples of the broader UNet→DiT migration in this space (see video-diffusion-unet-to-dit). From 2026-05-07-ai-avatar-motion-mimicking-models-survey.

Key facts

Authors: Alibaba ali-vilab.
Repos: github.com/ali-vilab/UniAnimate and github.com/ali-vilab/UniAnimate-DiT.
UniAnimate-DiT uses LoRA for parameter-efficient fine-tuning and a lightweight 3D-conv pose encoder.
Venue (V1): SCIS 2025.

Technical contributions

Unified noise input supporting both random-noised input and first-frame-conditioned input — enables long-video generation by iteratively conditioning each new chunk on the prior chunk's last frame.
State-space-model temporal architecture as a Transformer alternative for the temporal dimension.
Lightweight 3D-conv pose encoder (DiT variant) that encodes motion information cheaply.

Strengths

One-minute consistent generation through first-frame conditioning.
The DiT variant aligns with where the field is heading (Wan family backbone).

Weaknesses

LoRA-based fine-tuning means quality is bounded by the underlying Wan 2.1-14B I2V foundation.

Sources

2026-05-07-ai-avatar-motion-mimicking-models-survey

wan-animate — sibling on Wan 2.2 backbone, dedicated avatar focus.
mimicmotion
stableanimator
video-diffusion-unet-to-dit

UniAnimate

UniAnimate

What it is

Why it matters to ai-video-generation

Key facts

Technical contributions

Strengths

Weaknesses

Sources

Related