Stable Diffusion 3.5 and Flux are two of the top text-to-image models currently. This post compares and contrasts their outputs on a variety of prompts and shares some basic statistics about running the model.
Image Comparison
Stable Diffusion 3.5 (left) vs. Flux1.dev (right)
From top:
A cyberpunk character standing in a rain-soaked alley, with neon signs and holograms reflecting off their metallic outfit, holding a high-tech weapon, in an edgy, modern style.
A dynamic scene of a superhero flying above a bustling city, with their cape flowing and dramatic lighting highlighting their powerful pose, in a comic book-inspired style.
An intense battle scene between a knight and a monstrous creature in a stormy, dramatic environment, with glowing weapons and a sense of movement, in a cinematic style.
An abstract, futuristic cityscape composed of geometric shapes and vibrant colors, with no clear sense of gravity or orientation, in a surreal and modern style.
From top:
A serene underwater scene featuring a coral reef teeming with vibrant marine life, and a mermaid gracefully swimming amidst the fish, in a soft and enchanting style.
A serene Japanese garden in spring, with cherry blossoms in full bloom, a small pond with koi fish, and a traditional wooden bridge, in a realistic and tranquil style.
A quaint, colorful village in a lush valley, with cobblestone streets, flower-filled gardens, and a crystal-clear river running through it, in a cheerful and idyllic style.
A mysterious forest filled with oversized mushrooms that glow softly in the dark, with a fairy-like creature perched on one, in a mystical and enchanting style.
From top:
A giant, ancient tree with a glowing, hollow trunk, surrounded by tiny, glowing creatures that look like fireflies, in a peaceful and magical atmosphere.
A group of astronauts exploring a distant alien planet, with strange, glowing plants and a vibrant, otherworldly atmosphere, in a cinematic style.
A majestic dragon soaring over snow-capped mountains, with intricate scales that shimmer in the sunlight, rendered in a highly detailed, photorealistic style.
A futuristic laboratory filled with holographic displays, advanced machinery, and a scientist analyzing a glowing, otherworldly artifact, in a sleek, sci-fi style.
Flux by Black Forest Labs
Key Features:
- Offers a versatile lineup with four main model variants tailored for different users:
- FLUX1.1 [pro] and FLUX.1 [pro] are their managed product, available only through their API and through partners like Replicate.
- FLUX.1 [dev] is an open-weight, guidance-distilled model intended for non-commercial use; it balances quality and efficiency. It offers similar quality to FLUX.1 [pro] but is more efficient.
- FLUX.1 [schnell] is the fastest model optimized for local and personal use. It is openly licensed under Apache 2.0.
- Offers a versatile lineup with four main model variants tailored for different users:
Open Source: Yes for FLUX.1 [dev] and FLUX.1 [schnell]
Size: 12B parameters for FLUX.1 [dev] and FLUX.1 [schnell]
GPU Needed: A100 or H100
How to run: Flux tutorial on Modal
Stable Diffusion 3.5 by Stability
Key Features:
- Supports a wide range of output styles, including photorealism and stylized art.
- Fast inference with the Large Turbo variant (~2 seconds on an A100).
- Versatile for both commercial and personal projects due to its community license.
Open Source: Yes
Size: Multiple options, including 8.1B parameters for Large and 2.5B for Medium
GPU Needed: A100 for Large, A10 for Medium
How to run: Stable Diffusion CLI on Modal
Conclusion
You can run any open-source text-to-image on Modal’s serverless GPUs. Here are a few guides to get you started.