Play with the ControlNet demos

This example allows you to play with all 10 demonstration Gradio apps from the new and amazing ControlNet project. ControlNet provides a minimal interface allowing users to use images to constrain StableDiffusion’s generation process. With ControlNet, users can easily condition the StableDiffusion image generation with different spatial contexts including a depth maps, segmentation maps, scribble drawings, and keypoints!

Imports and config preamble 

Below are the configuration objects for all 10 demos provided in the original lllyasviel/ControlNet repo. The demos each depend on their own custom pretrained StableDiffusion model, and these models are 5-6GB each. We can only run one demo at a time, so this module avoids downloading the model and ‘detector’ dependencies for all 10 demos and instead uses the demo configuration object to download only what’s necessary for the chosen demo.

Even just limiting our dependencies setup to what’s required for one demo, the resulting container image is huge.

Pick a demo, any demo 

Simply by changing the DEMO_NAME below, you can change which ControlNet demo app is setup and run by this Modal script.

Setting up the dependencies 

ControlNet requires a lot of dependencies which could be fiddly to setup manually, but Modal’s programmatic container image building Python APIs handle this complexity straightforwardly and automatically.

To run any of the 10 demo apps, we need the following:

  1. a base Python 3 Linux image (we use Debian Slim)
  2. a bunch of third party PyPi packages
  3. git, so that we can download the ControlNet source code (there’s no controlnet PyPi package)
  4. some image process Linux system packages, including ffmpeg
  5. and demo specific pre-trained model and detector .pth files

That’s a lot! Fortunately, the code below is already written for you that stitches together a working container image ready to produce remarkable ControlNet images.

Note: a ControlNet model pipeline is now available in Huggingface’s diffusers package. But this does not contain the demo apps.

Serving the Gradio web UI 

Each ControlNet gradio demo module exposes a block Gradio interface running in queue-mode, which is initialized in module scope on import and served on 0.0.0.0. We want the block interface object, but the queueing and launched webserver aren’t compatible with Modal’s serverless web endpoint interface, so in the import_gradio_app_blocks function we patch out these behaviors.

Because the ControlNet gradio apps are so time and compute intensive to cold-start, the web app function is limited to running just 1 warm container (max_containers=1). This way, while playing with the demos we can pay the cold-start cost once and have all web requests hit the same warm container. Spinning up extra containers to handle additional requests would not be efficient given the cold-start time. We set the scaledown_window to 600 seconds so the container will be kept running for 10 minutes after the last request, to keep the app responsive in case of continued experimentation.

Have fun! 

Serve your chosen demo app with modal serve controlnet_gradio_demos.py. If you don’t have any images ready at hand, try one that’s in the 06_gpu_and_ml/controlnet/demo_images/ folder.

StableDiffusion was already impressive enough, but ControlNet’s ability to so accurately and intuitively constrain the image generation process is sure to put a big, dumb grin on your face.