Generate videos from prompts with Lightricks LTX-Video

This example demonstrates how to run the LTX-Video video generation model by Lightricks on Modal.

LTX-Video is fast! Generating a twenty second 480p video at moderate quality takes as little as two seconds on a warm container.

Here’s one that we generated:

Setup 

We start by importing dependencies we need locally, defining a Modal App, and defining the container Image that our video model will run in.

Storing data on Modal Volumes 

On Modal, we save large or expensive-to-compute data to distributed Volumes that are accessible both locally and remotely.

We’ll store the LTX-Video model’s weights and the outputs we generate on Modal Volumes.

We store the outputs on a Modal Volume so that clients don’t need to sit around waiting for the video to be generated.

We store the weights on a Modal Volume so that we don’t have to fetch them from the Hugging Face Hub every time a container boots. This download takes about two minutes, depending on traffic and network speed.

We don’t have to change any of the Hugging Face code to do this — we just set the location of Hugging Face’s cache to be on a Volume using the HF_HOME environment variable.

For more on storing model weights on Modal, see this guide.

Setting up our LTX class 

We use the @cls decorator to specify the infrastructure our inference function needs, as defined above.

That decorator also gives us control over the lifecycle of our cloud container.

Specifically, we use the enter method to load the model into GPU memory (from the Volume if it’s present or the Hub if it’s not) before the container is marked ready for inputs.

This helps reduce tail latencies caused by cold starts. For details and more tips, see this guide.

The actual inference code is in a modal.method of the class.

Generate videos from the command line 

We trigger LTX-Video inference from our local machine by running the code in the local entrypoint below with modal run.

It will spin up a new replica to generate a video. Then it will, by default, generate a second video to demonstrate the lower latency when hitting a warm container.

You can trigger inference with:

All outputs are saved both locally and on a Modal Volume. You can explore the contents of Modal Volumes from your Modal Dashboard or from the command line with the modal volume command.

See modal volume --help for details.

Optional command line flags for the script can be viewed with:

Using these flags, you can tweak your generation from the command line:

Addenda 

The remainder of the code in this file is utility code.