Stable Diffusion 3.5 and Flux are two of the top text-to-image models currently. In this post, we will compare and contrast them, along with other top models on the Artificial Analysis Leaderboard. This is a leaderboard put out by HuggingFace that ranks different categories of models, including text-to-image models, according to their performance, quality, and user feedback.
Flux by Black Forest Labs
Key Features:
- Offers a versatile lineup with four main model variants tailored for different users:
- FLUX1.1 [pro] and FLUX.1 [pro] are their managed product, available only through their API and through partners like Replicate.
- FLUX.1 [dev] is an open-weight, guidance-distilled model intended for non-commercial use; it balances quality and efficiency. It offers similar quality to FLUX.1 [pro] but is more efficient.
- FLUX.1 [schnell] is the fastest model optimized for local and personal use. It is openly licensed under Apache 2.0.
- Offers a versatile lineup with four main model variants tailored for different users:
Open Source: Yes for FLUX.1 [dev] and FLUX.1 [schnell]
Size: 12B parameters for FLUX.1 [dev] and FLUX.1 [schnell]
GPU Needed: A100 or H100
How to run: Flux tutorial on Modal
Prompt:
Emma Watson in a Bridgerton costume
Stable Diffusion 3.5 by Stability
Key Features:
- Supports a wide range of output styles, including photorealism and stylized art.
- Fast inference with the Large Turbo variant (~2 seconds on an A100).
- Versatile for both commercial and personal projects due to its community license.
Open Source: Yes
Size: Multiple options, including 8.1B parameters for Large and 2.5B for Medium
GPU Needed: A100 for Large, A10 for Medium
How to run: Stable Diffusion CLI on Modal
Prompt:
Emma Watson in a Bridgerton costume
Stable Diffusion XL
Key Features:
- Particularly well-tuned for vibrant and accurate colors, with better contrast, lighting, and shadows
- Native 1024x1024 resolution.
Open Source: Yes
Size: 3.5B parameters
GPU Needed: A10G
Prompt:
Emma Watson in a Bridgerton costume
Midjourney
Key Features:
- Known for delivering some of the most artistic and visually appealing images.
- Excels at photorealistic portraits and highly detailed imagery.
- Offers extensive options and adjustments via Discord.
- License limits usage, focusing more on personal and non-commercial projects.
Open Source: No
Price: $10-$120/month depending on plan
Prompt:
A hyper-realistic scene of a mannequin entirely composed of delicate pink flowers, seated on a wooden bench, wearing a classic black coat.
Snowdrifts have formed around the bench, with gentle snowfall blanketing the scene, creating a soft contrast against the vibrant floral figure.
In the background, a bustling metropolis with towering skyscrapers fades into the snowy mist, adding depth and atmosphere.
The scene feels cinematic, as if a frame from a film, capturing a moment of surreal beauty in an urban winter landscape."
--ar 9:16 --quality 2 --style raw --v 6.1
Playground v3 by Playground
Key Features:
- Accepts reference images for enhanced prompt-following.
- Best for graphic design, adhering closely to prompt, highly customizable.
- Generate text content with quotation marks.
- You can use the content inside the quotation marks to describe and control image generation.
- By entering descriptive text in quotation marks, you can instruct the system to generate images of a specific style or content.
- Previously offered an open-source model but focus is shifting away from open-source development.
Open Source: No
Pricing: $15/month
Imagen 3 by Google
Key Features:
- Produces highly detailed images with refined lighting and minimal artifacts.
- Delivers high-quality outputs with simpler prompts.
- Includes content filtering and SynthID watermarking for ethical use.
Open Source: No
Price: $12/user/month with Gemini
DALL-E by OpenAI
Key Features:
- Integrated into ChatGPT, making it a highly accessible tool for general users.
- Simple to use with natural language prompts.
- Produces high-quality outputs with minimal prompt engineering.
Open Source: No
Price: $20/month with ChatGPT Plus
Prompt:
Punk-rock Moses parting the red sea
Conclusion
You can run any open-source text-to-image on Modal’s serverless GPUs. Here are a few guides to get you started.