Article

October 30, 2024•5 minute read

Top open-source text-to-video AI models

Solutions Engineer

Updated: 2025-02-28

Open-source text-to-video AI models are rapidly approaching the quality of leading closed-source models like Kling or OpenAI’s Sora.

Here’s a quick look at some of the top currently trending (as of this writing) open-source text-to-video models:

Model	Parameters	Created by	Released
HunyuanVideo	13 billion	Tencent	Dec 03 2024
Mochi (deploy on Modal)	10 billion	Genmo	Oct 22 2024
Wan2.1	14 billion	Alibaba	Feb 25 2024

And here’s a comparison of their video generation quality for the sample prompt: “A white dove is flapping its wings, flying freely in the sky, in anime style.” (prompt taken from Penguin Video Benchmark)

Hunyuan

Mochi

Wan2.1

HunyuanVideo

Released: Dec 3, 2024
Creator: Tencent

Hunyuan (roughly pronounced “hwen-yoo-en” in English) is the leading open-source text to video AI model. It is consistently at or near the top of HuggingFace’s trending models and by far the most discussed model in our community Slack.

Key features:

Over 13 billion parameters
Diffusers integration
FP8 model weights to save GPU memory
Several popular fine-tunes e.g. SkyReels V1 which is fine-tuned on 10s of millions of human-centric film and television clips

Example videos:

Ultra-realistic, intricate textures. Panspermia Extraterrestrial life, Fermi paradox, swirling dust particles, Unreal engine 5 render

An astronaut flying in space by Hokusai, in the style of Ukiyochi

A few golden retrievers playing in the snow

These videos demonstrate Hunyuan’s high quality and realistic generation capabilities, though the astronaut video does not really adhere to the style prompt.

Mochi

Released: Oct 22, 2024
Creator: Genmo

Mochi is a popular high quality text-to-video model. Mochi ranks similarly to Hunyuan on crowd-sourced leaderboards.

Key features:

10 billion parameters
Easy to deploy on Modal
Support for LoRA fine-tuning
Native ComfyUI integration

Examples:

Ultra-realistic, intricate textures. Panspermia Extraterrestrial life, Fermi paradox, swirling dust particles, Unreal engine 5 render

An astronaut flying in space by Hokusai, in the style of Ukiyochi

A few golden retrievers playing in the snow

In these examples, Mochi’s quality is generally a little worse compared to Hunyuan, though the first example is arguably my favorite in the entire series of videos in this article.

Wan2.1

Released: Feb 25, 2024
Creator: Alibaba

Wan2.1 is the most recent model in this series and is positioning itself as the newest state-of-the-art.

Key features:

14 billion parameters
Smaller 1.3 billion parameter version also available
ComfyUI integration

Example videos:

Ultra-realistic, intricate textures. Panspermia Extraterrestrial life, Fermi paradox, swirling dust particles, Unreal engine 5 render

An astronaut flying in space by Hokusai, in the style of Ukiyochi

A few golden retrievers playing in the snow

The overall quality of Wan2.1 is maybe slightly worse than Hunyuan, but it does the best job adhering to the style instructions of the astronaut prompt.

Notable mentions:

Step-Video-T2V: Relatively unknown AI startup Stepfun’s 30B(!) parameter model.
AnimateDiff-Lightning: Bytedance’s faster version of the popular AnimateDiff (especially among ComfyUI users). This isn’t a standalone text-to-video model, but rather a video adapter for an existing text-to-image base model like Stable Diffusion.

Running Text-to-Video AI Models

Text-to-video AI models are inherently difficult to run due to their large parameter sizes and the complexity of video generation tasks.

A good rule of thumb for selecting a model is to pick one with a diffusers or ComfyUI integration. This is a good indicator of model maturity and will make your deploment process much easier.

Conclusion

The text-to-video space is moving at a very fast clip, with new models claiming “state-of-the-art” being released every few weeks.

As GPUs become easier and cheaper to access, deploying open-source models like Hunyuan, Mochi, and Wan2.1 are becoming even more attractive options. At Modal, this is as simple as running our end-to-end Mochi example, but you can run any code on Modal in a cost-effective and developer-friendly way.

Top open-source text-to-video AI models

HunyuanVideo

Mochi

Wan2.1

Notable mentions:

Running Text-to-Video AI Models

Conclusion

Ship your first app in minutes.