Modal has raised an $87M Series B led by Lux Capital. Read more

Deploy fully customizable image and video pipelines

Ship standalone models or complex workflows in minutes rather than weeks

import modal

MODEL_NAME = "black-forest-labs/FLUX.1-schnell"

image = (

    modal.Image.from_registry("nvidia/cuda:12.4.0-devel-ubuntu22.04")

    .pip_install("torch", "transformers", "diffusers", ...)

volume = modal.Volume.from_name("flux-lora-models")

@app.cls(gpu="H100", image=image, volumes={"/loras": volume})

class FluxWithLoRA:

    @modal.enter()

    def setup(self):

        self.pipeline = FluxPipeline.from_pretrained(MODEL_NAME).to("cuda")

        self.pipeline.load_and_fuse_lora()

    @modal.method()

    def generate_image(self, prompt: str):

        return self.pipeline(prompt).images[0]

flux = FluxWithLoRA()

flux.generate_image.remote("")

“As a startup, you need to iterate on things quickly. So it’s really helpful when the development speed is suddenly 10x. It’s a lot easier to deploy a ComfyUI workflow because Modal is serverless, so it auto-scales really well.”

Coco Mao, CEO & Co-founder

“We are constantly shipping the most cutting-edge creative AI machine learning techniques so our customers have access to the best creative models. Modal has helped us streamline the process from idea to deployed pipeline, allowing us to both deploy quickly & scale rapidly.”

Weber Wong, Founder

For companies graduating from image and video APIs

Custom models

Deploy fine-tuned or proprietary models

Multiple adapters

Layer on any combination of adapters, from LoRA to ControlNet

Custom workflows

Deploy complex ComfyUI workflows that use custom nodes

Reliably autoscale to thousands of GPUs

Modal’s Rust-based container stack spins up GPUs in < 1s.

Modal autoscales up and down for max cost efficiency.

Modal’s proprietary cloud capacity orchestrator guarantees high GPU availability.

Modal’s Rust-based container stack spins up GPUs in < 1s.

Modal autoscales up and down for max cost efficiency.

Modal’s proprietary cloud capacity orchestrator guarantees high GPU availability.

Deploy low-latency image and video apps

Serve interactive experiences anywhere with our global GPU fleet.

Reduce cold starts by 10x for models and custom ComfyUI nodes with GPU memory snapshotting.

Achieve 20ms networking latency for video streams using WebRTC on Modal.

Serve interactive experiences anywhere with our global GPU fleet.

Reduce cold starts by 10x for models and custom ComfyUI nodes with GPU memory snapshotting.

Achieve 20ms networking latency for video streams using WebRTC on Modal.

Built with Modal

All examples

Custom pet art from Flux with Hugging Face and Gradio

Fine-tune an image generation model on pictures of your pet

Edit images with Flux Kontext

Transform images with SotA diffusion models

Serve diffusion models

Serve Flux on Modal with optimizations for blazingly fast inference

Star in custom music videos

Fine-tune a Wan2.1 video model on your face and run it in parallel

Bring images to life

Prompt a generative video model to animate an image

Render a video with Blender

Render an animated 3D scene using Blender's Python interface across many processors in parallel

Fine-tune Flux with LoRA

Fine-tune diffusion models like Flux using Low-Rank Adaptation (LoRA)

Generate videos with Mochi

Use Mochi to generate short AI-powered videos from prompts

Deploy a ComfyUI workflow

Run and serve custom ComfyUI workflows on Modal

Ship your first app in minutes.

Get Started

$30 / month free compute