Run Stable Diffusion 3.5 Large Turbo as a CLI, API, and web UI
This example shows how to run Stable Diffusion 3.5 Large Turbo on Modal to generate images from your local command line, via an API, and as a web UI.
Inference takes about one minute to cold start, at which point images are generated at a rate of one image every 1-2 seconds for batch sizes between one and 16.
Below are four images produced by the prompt “A princess riding on a pony”.
Basic setup
All Modal programs need an App — an object that acts as a recipe for
the application. Let’s give it a friendly name.
Configuring dependencies
The model runs remotely inside a container. That means we need to install the necessary dependencies in that container’s image.
Below, we start from a lightweight base Linux image
and then install our Python dependencies, like Hugging Face’s diffusers library and torch.
Implementing SD3.5 Large Turbo inference on Modal
We wrap inference in a Modal Cls that ensures models are loaded and then moved to the GPU once when a new container starts, before the container picks up any work.
The run function just wraps a diffusers pipeline.
It sends the output image back to the client as bytes.
We also include a web wrapper that makes it possible
to trigger inference via an API call.
See the /docs route of the URL ending in inference-web.modal.run that appears when you deploy the app for details.
Generating Stable Diffusion images from the command line
This is the command we’ll use to generate images. It takes a text prompt,
a batch_size that determines the number of images to generate per prompt,
and the number of times to run image generation (samples).
You can also provide a seed to make sampling more deterministic.
Run it with
and pass --help to see more options.
Generating Stable Diffusion images via an API
The Modal Cls above also included a fastapi_endpoint,
which adds a simple web API to the inference method.
To try it out, run
copy the printed URL ending in inference-web.modal.run,
and add /docs to the end. This will bring up the interactive
Swagger/OpenAPI docs for the endpoint.
Generating Stable Diffusion images in a web UI
Lastly, we add a simple front-end web UI (written in Alpine.js) for our image generation backend.
This is also deployed by running
The Inference class will serve multiple users from its own auto-scaling pool of warm GPU containers automatically.