Run Stable Diffusion 3.5 Large Turbo as a CLI, API, and web UI

This example shows how to run Stable Diffusion 3.5 Large Turbo on Modal to generate images from your local command line, via an API, and as a web UI.

Inference takes about one minute to cold start, at which point images are generated at a rate of one image every 1-2 seconds for batch sizes between one and 16.

Below are four images produced by the prompt “A princess riding on a pony”.

stable diffusion montage

Basic setup 

All Modal programs need an App — an object that acts as a recipe for the application. Let’s give it a friendly name.

Configuring dependencies 

The model runs remotely inside a container. That means we need to install the necessary dependencies in that container’s image.

Below, we start from a lightweight base Linux image and then install our Python dependencies, like Hugging Face’s diffusers library and torch.

Implementing SD3.5 Large Turbo inference on Modal 

We wrap inference in a Modal Cls that ensures models are loaded and then moved to the GPU once when a new container starts, before the container picks up any work.

The run function just wraps a diffusers pipeline. It sends the output image back to the client as bytes.

We also include a web wrapper that makes it possible to trigger inference via an API call. See the /docs route of the URL ending in inference-web.modal.run that appears when you deploy the app for details.

Generating Stable Diffusion images from the command line 

This is the command we’ll use to generate images. It takes a text prompt, a batch_size that determines the number of images to generate per prompt, and the number of times to run image generation (samples).

You can also provide a seed to make sampling more deterministic.

Run it with

and pass --help to see more options.

Generating Stable Diffusion images via an API 

The Modal Cls above also included a fastapi_endpoint, which adds a simple web API to the inference method.

To try it out, run

copy the printed URL ending in inference-web.modal.run, and add /docs to the end. This will bring up the interactive Swagger/OpenAPI docs for the endpoint.

Generating Stable Diffusion images in a web UI 

Lastly, we add a simple front-end web UI (written in Alpine.js) for our image generation backend.

This is also deployed by running

The Inference class will serve multiple users from its own auto-scaling pool of warm GPU containers automatically.