brain/
← all entities
entitygenericai-video-generation

Wan 2.2

Notes

Wan 2.2

One-line summary: Alibaba's Apache-2.0 open-source frontier general video model and the parent foundation under which the dedicated avatar variant wan-animate ships.

What it is

A family of open-source video generation models supporting text-to-video, image-to-video, text-image-to-video, speech-to-video (with optional pose guidance), and dedicated character animation/replacement (Wan-Animate). DiT-based.

Why it matters to ai-video-generation

The non-avatar parts of Wan 2.2 still matter for this thread because (a) it sets the architectural standard for DiT-based open video generation that avatar work is moving onto, and (b) the S2V variant supports pose-driven generation synchronized with audio via a --pose_video parameter — which is itself an avatar-relevant primitive even outside Wan-Animate. From 2026-05-07-ai-avatar-motion-mimicking-models-survey.

Key facts

  • Vendor: Alibaba (Wan-AI / Tongyi Lab).
  • License: Apache 2.0.
  • Repo: github.com/Wan-Video/Wan2.2.

Model variants

  • T2V-A14B / I2V-A14B: Mixture-of-Experts text-to-video and image-to-video, 27B total / 14B active per step. Supports 480P and 720P.
  • TI2V-5B: hybrid text+image-to-video, 5B params with high-compression VAE, 720P.
  • S2V-14B: Speech-to-Video. Has a --pose_video parameter that enables pose-driven generation synchronized with audio input.
  • Animate-14B: character animation/replacement specialist — see wan-animate.

Strengths

  • Apache 2.0 with weights — fully open.
  • Cinematic-level aesthetic control: lighting, color, composition.
  • Pose-driven + audio-driven generation in the same family.
  • DiT-based, in line with where the field is heading.

Weaknesses

  • Compute-heavy (MoE 27B / 14B active).
  • Closed competitors (Kling 3.0) reportedly score higher on raw visual fidelity benchmarks (vendor claims; not independently verified).

Sources

Related

Referenced by