import modal
MODEL_NAME = "black-forest-labs/FLUX.1-schnell"
image = (
modal.Image.from_registry("nvidia/cuda:12.4.0-devel-ubuntu22.04")
.pip_install("torch", "transformers", "diffusers", ...)
)
volume = modal.Volume.from_name("flux-lora-models")
@app.cls(gpu="H100", image=image, volumes={"/loras": volume})
class FluxWithLoRA:
@modal.enter()
def setup(self):
self.pipeline = FluxPipeline.from_pretrained(MODEL_NAME).to("cuda")
self.pipeline.load_and_fuse_lora()
@modal.method()
def generate_image(self, prompt: str):
return self.pipeline(prompt).images[0]
flux = FluxWithLoRA()
flux.generate_image.remote("")
“As a startup, you need to iterate on things quickly. So it’s really helpful when the development speed is suddenly 10x. It’s a lot easier to deploy a ComfyUI workflow because Modal is serverless, so it auto-scales really well.”
Modal’s Rust-based container stack spins up GPUs in < 1s.
Modal autoscales up and down for max cost efficiency.
Modal’s proprietary cloud capacity orchestrator guarantees high GPU availability.
Serve interactive experiences anywhere with our global GPU fleet.
Reduce cold starts by 10x for models and custom ComfyUI nodes with GPU memory snapshotting.
Achieve 20ms networking latency for video streams using WebRTC on Modal.