Image, video & 3D inference

Turn your media generation pipeline into a service that scales to infinity with Modal.

Get started
Scale to infinity, and to zero

Autoscale to hundreds of GPUs as needed, and then back down to 0 when idle. No more configuring autoscalers or paying for idle clusters!

GPU Containers
Blazing fast cold-starts

Spin up containers with large media models in a few seconds, allowing for responsive scaling while having to keep less GPUs idle.

From function to pipeline

Modal allows you to chain together functions that have disparate hardware requirements and image definitions. Go from running a GPU endpoint to an entire pipeline with pre- and post-processing that runs on CPU-only containers.

LoRA made easy

Low-rank adaptation (LoRA) is a technique that makes it possible to create fine-tuned models in the form of small adapters that can be applied to the original model.

Modal’s parametrized functions make it trivial to build applications where you perform inference for a dynamic set of LoRA adapters. Now you can fine-tune your models on-demand, store the adapters in Volumes and immediately have them ready to go for inference.

Try it out

Ship your first app in minutes

with $30 / month free compute