November 2, 20245 minute read
Stable Diffusion 3.5 vs. Flux: top text-to-image models
author
Yiren Lu@YirenLu
Solutions Engineer

Stable Diffusion 3.5 and Flux are two of the top text-to-image models currently. In this post, we will compare and contrast them, along with other top models on the Artificial Analysis Leaderboard. This is a leaderboard put out by HuggingFace that ranks different categories of models, including text-to-image models, according to their performance, quality, and user feedback.

Flux by Black Forest Labs

  • Key Features:

    • Offers a versatile lineup with four main model variants tailored for different users:
      • FLUX1.1 [pro] and FLUX.1 [pro] are their managed product, available only through their API and through partners like Replicate.
      • FLUX.1 [dev] is an open-weight, guidance-distilled model intended for non-commercial use; it balances quality and efficiency. It offers similar quality to FLUX.1 [pro] but is more efficient.
      • FLUX.1 [schnell] is the fastest model optimized for local and personal use. It is openly licensed under Apache 2.0.
  • Open Source: Yes for FLUX.1 [dev] and FLUX.1 [schnell]

  • Size: 12B parameters for FLUX.1 [dev] and FLUX.1 [schnell]

  • GPU Needed: A100 or H100

  • How to run: Flux tutorial on Modal

Prompt:

Emma Watson in a Bridgerton costume

Stable Diffusion 3.5 by Stability

  • Key Features:

    • Supports a wide range of output styles, including photorealism and stylized art.
    • Fast inference with the Large Turbo variant (~2 seconds on an A100).
    • Versatile for both commercial and personal projects due to its community license.
  • Open Source: Yes

  • Size: Multiple options, including 8.1B parameters for Large and 2.5B for Medium

  • GPU Needed: A100 for Large, A10 for Medium

  • How to run: Stable Diffusion CLI on Modal

Prompt:

Emma Watson in a Bridgerton costume

Stable Diffusion XL

  • Key Features:

    • Particularly well-tuned for vibrant and accurate colors, with better contrast, lighting, and shadows
    • Native 1024x1024 resolution.
  • Open Source: Yes

  • Size: 3.5B parameters

  • GPU Needed: A10G

Prompt:

Emma Watson in a Bridgerton costume

Midjourney

  • Key Features:

    • Known for delivering some of the most artistic and visually appealing images.
    • Excels at photorealistic portraits and highly detailed imagery.
    • Offers extensive options and adjustments via Discord.
    • License limits usage, focusing more on personal and non-commercial projects.
  • Open Source: No

  • Price: $10-$120/month depending on plan

Prompt:

A hyper-realistic scene of a mannequin entirely composed of delicate pink flowers, seated on a wooden bench, wearing a classic black coat.
Snowdrifts have formed around the bench, with gentle snowfall blanketing the scene, creating a soft contrast against the vibrant floral figure.
In the background, a bustling metropolis with towering skyscrapers fades into the snowy mist, adding depth and atmosphere.
The scene feels cinematic, as if a frame from a film, capturing a moment of surreal beauty in an urban winter landscape."
--ar 9:16 --quality 2 --style raw --v 6.1

Playground v3 by Playground

  • Key Features:

    • Accepts reference images for enhanced prompt-following.
    • Best for graphic design, adhering closely to prompt, highly customizable.
    • Generate text content with quotation marks.
      • You can use the content inside the quotation marks to describe and control image generation.
      • By entering descriptive text in quotation marks, you can instruct the system to generate images of a specific style or content.
    • Previously offered an open-source model but focus is shifting away from open-source development.
  • Open Source: No

  • Pricing: $15/month

Imagen 3 by Google

  • Key Features:

    • Produces highly detailed images with refined lighting and minimal artifacts.
    • Delivers high-quality outputs with simpler prompts.
    • Includes content filtering and SynthID watermarking for ethical use.
  • Open Source: No

  • Price: $12/user/month with Gemini

DALL-E by OpenAI

  • Key Features:

    • Integrated into ChatGPT, making it a highly accessible tool for general users.
    • Simple to use with natural language prompts.
    • Produces high-quality outputs with minimal prompt engineering.
  • Open Source: No

  • Price: $20/month with ChatGPT Plus

Prompt:

Punk-rock Moses parting the red sea

Conclusion

You can run any open-source text-to-image on Modal’s serverless GPUs. Here are a few guides to get you started.

Ship your first app in minutes.

Get Started

$30 / month free compute