brain/
← all entities
entitygenericai-video-generation

OmniHuman 1.5

Notes

OmniHuman 1.5

One-line summary: ByteDance closed/API audio-driven full-body avatar model (August 2025) that bridges a multimodal LLM with a Diffusion Transformer in a "dual-system" cognitive design.

What it is

A single-image + voice-track to video model. Optional text prompts for refinement. Notably does not accept a motion video as a driving signal — it is image+audio only, deriving expression and gesture from the audio. From 2026-05-07-ai-avatar-motion-mimicking-models-survey.

Why it matters to ai-video-generation

OmniHuman is one of the few closed-source models that produces expressive full-body (not just talking-head) avatar video from audio alone. The "dual-system" architecture (slow planning via MLLM, fast reaction via diffusion) is one of the more architecturally distinctive contributions in the audio-driven branch.

Key facts

  • Vendor: ByteDance.
  • Status: closed; available via BytePlus API and partner platforms (fal.ai, eachlabs, MindStudio).
  • Released: August 2025 (v1.5; succeeds v1).
  • Output: 1024×1024 at 30fps, up to ~30 seconds; project page also claims "videos over one minute" with dynamic motion / continuous camera / multi-character interactions.

Technical contributions

  • Dual-system architecture: bridges a Multimodal Large Language Model (slow, deliberate planning) with a Diffusion Transformer (fast, intuitive reaction). Frames this as cognitive simulation.
  • Handles multi-character interactions, anime, stylized art, anthropomorphic figures — not limited to realistic photographs.

Capabilities

  • Lip-sync coherent with rhythm, prosody, semantic content.
  • Musical performances with rich expression beyond mere lip-sync.
  • Context-aware gesturing tied to speech content.

Strengths

  • Full-body audio-driven in a single closed model — rare combination.
  • Stylistic range (real, anime, anthropomorphic).

Weaknesses

  • Closed source and API-gated.
  • No motion-video driving signal — can't be used for "puppeting" workflows.
  • Vendor benchmarks; no independent comparison surfaced.

Sources

Related

Referenced by