Startups get up to $50k in free compute credits.
October 30, 20245 minute read
Top open-source text-to-video AI models
author
Yiren Lu@YirenLu
Solutions Engineer

Updated: 2025-02-28

Open-source text-to-video AI models are rapidly approaching the quality of leading closed-source models like Kling or OpenAI’s Sora.

Here’s a quick look at some of the top currently trending (as of this writing) open-source text-to-video models:

Model Parameters Created by Released
HunyuanVideo 13 billion Tencent Dec 03 2024
Mochi (deploy on Modal) 10 billion Genmo Oct 22 2024
Wan2.1 14 billion Alibaba Feb 25 2024

And here’s a comparison of their video generation quality for the sample prompt: “A white dove is flapping its wings, flying freely in the sky, in anime style.” (prompt taken from Penguin Video Benchmark)

Hunyuan
Mochi
Wan2.1

HunyuanVideo

  • Released: Dec 3, 2024
  • Creator: Tencent

Hunyuan (roughly pronounced “hwen-yoo-en” in English) is the leading open-source text to video AI model. It is consistently at or near the top of HuggingFace’s trending models and by far the most discussed model in our community Slack.

Key features:

  • Over 13 billion parameters
  • Diffusers integration
  • FP8 model weights to save GPU memory
  • Several popular fine-tunes e.g. SkyReels V1 which is fine-tuned on 10s of millions of human-centric film and television clips

Example videos:

Ultra-realistic, intricate textures. Panspermia Extraterrestrial life, Fermi paradox, swirling dust particles, Unreal engine 5 render
An astronaut flying in space by Hokusai, in the style of Ukiyochi
A few golden retrievers playing in the snow

These videos demonstrate Hunyuan’s high quality and realistic generation capabilities, though the astronaut video does not really adhere to the style prompt.

Mochi

  • Released: Oct 22, 2024
  • Creator: Genmo

Mochi is a popular high quality text-to-video model. Mochi ranks similarly to Hunyuan on crowd-sourced leaderboards.

Key features:

Examples:

Ultra-realistic, intricate textures. Panspermia Extraterrestrial life, Fermi paradox, swirling dust particles, Unreal engine 5 render
An astronaut flying in space by Hokusai, in the style of Ukiyochi
A few golden retrievers playing in the snow

In these examples, Mochi’s quality is generally a little worse compared to Hunyuan, though the first example is arguably my favorite in the entire series of videos in this article.

Wan2.1

  • Released: Feb 25, 2024
  • Creator: Alibaba

Wan2.1 is the most recent model in this series and is positioning itself as the newest state-of-the-art.

Key features:

  • 14 billion parameters
  • Smaller 1.3 billion parameter version also available
  • ComfyUI integration

Example videos:

Ultra-realistic, intricate textures. Panspermia Extraterrestrial life, Fermi paradox, swirling dust particles, Unreal engine 5 render
An astronaut flying in space by Hokusai, in the style of Ukiyochi
A few golden retrievers playing in the snow

The overall quality of Wan2.1 is maybe slightly worse than Hunyuan, but it does the best job adhering to the style instructions of the astronaut prompt.

Notable mentions:

  • Step-Video-T2V: Relatively unknown AI startup Stepfun’s 30B(!) parameter model.
  • AnimateDiff-Lightning: Bytedance’s faster version of the popular AnimateDiff (especially among ComfyUI users). This isn’t a standalone text-to-video model, but rather a video adapter for an existing text-to-image base model like Stable Diffusion.

Running Text-to-Video AI Models

Text-to-video AI models are inherently difficult to run due to their large parameter sizes and the complexity of video generation tasks.

A good rule of thumb for selecting a model is to pick one with a diffusers or ComfyUI integration. This is a good indicator of model maturity and will make your deploment process much easier.

Conclusion

The text-to-video space is moving at a very fast clip, with new models claiming “state-of-the-art” being released every few weeks.

As GPUs become easier and cheaper to access, deploying open-source models like Hunyuan, Mochi, and Wan2.1 are becoming even more attractive options. At Modal, this is as simple as running our end-to-end Mochi example, but you can run any code on Modal in a cost-effective and developer-friendly way.

Ship your first app in minutes.

Get Started

$30 / month free compute