Wan 2.2
Wan 2.2
One-line summary: Alibaba's Apache-2.0 open-source frontier general video model and the parent foundation under which the dedicated avatar variant wan-animate ships.
What it is
A family of open-source video generation models supporting text-to-video, image-to-video, text-image-to-video, speech-to-video (with optional pose guidance), and dedicated character animation/replacement (Wan-Animate). DiT-based.
Why it matters to ai-video-generation
The non-avatar parts of Wan 2.2 still matter for this thread because (a) it sets the architectural standard for DiT-based open video generation that avatar work is moving onto, and (b) the S2V variant supports pose-driven generation synchronized with audio via a --pose_video parameter — which is itself an avatar-relevant primitive even outside Wan-Animate. From 2026-05-07-ai-avatar-motion-mimicking-models-survey.
Key facts
- Vendor: Alibaba (Wan-AI / Tongyi Lab).
- License: Apache 2.0.
- Repo: github.com/Wan-Video/Wan2.2.
Model variants
- T2V-A14B / I2V-A14B: Mixture-of-Experts text-to-video and image-to-video, 27B total / 14B active per step. Supports 480P and 720P.
- TI2V-5B: hybrid text+image-to-video, 5B params with high-compression VAE, 720P.
- S2V-14B: Speech-to-Video. Has a
--pose_videoparameter that enables pose-driven generation synchronized with audio input. - Animate-14B: character animation/replacement specialist — see wan-animate.
Strengths
- Apache 2.0 with weights — fully open.
- Cinematic-level aesthetic control: lighting, color, composition.
- Pose-driven + audio-driven generation in the same family.
- DiT-based, in line with where the field is heading.
Weaknesses
- Compute-heavy (MoE 27B / 14B active).
- Closed competitors (Kling 3.0) reportedly score higher on raw visual fidelity benchmarks (vendor claims; not independently verified).
Sources
Related
- wan-animate — the dedicated avatar variant in this family.
- unianimate — the UniAnimate-DiT successor is built on Wan 2.1 (predecessor of 2.2).
- video-diffusion-unet-to-dit