Turn your media generation pipeline into a service that scales to infinity with Modal.
Get startedAutoscale to hundreds of GPUs as needed, and then back down to 0 when idle. No more configuring autoscalers or paying for idle clusters!
Spin up containers with large media models in a few seconds, allowing for responsive scaling while having to keep less GPUs idle.
Modal allows you to chain together functions that have disparate hardware requirements and image definitions. Go from running a GPU endpoint to an entire pipeline with pre- and post-processing that runs on CPU-only containers.
Low-rank adaptation (LoRA) is a technique that makes it possible to create fine-tuned models in the form of small adapters that can be applied to the original model.
Modal’s parametrized functions make it trivial to build applications where you perform inference for a dynamic set of LoRA adapters. Now you can fine-tune your models on-demand, store the adapters in Volumes and immediately have them ready to go for inference.