Check out our new GPU Glossary! Read now
February 21, 20243 minute read
How Suno shaved 4 months off their launch timeline with Modal

Suno logo

Suno uses Modal to scale inference and batch pre-processing to thousands of GPUs. With Modal, Suno was able to bring a state-of-the-art music generation model to market four months early instead of hiring a team of engineers to build and maintain infrastructure.

About Suno

Suno is a music generation app that can make any song you describe. Enter a simple text description—like “a deep house song about serverless infra”—and Suno makes you a song complete with vocals in seconds. Suno’s users include Grammy-winning artists, but the core user base is people experiencing making music for the first time. Microsoft recently announced they’ve partnered with Suno to bring song generation capabilities to Copilot, their AI chatbot!

Avoiding past infrastructure pain

Prior to starting Suno, all four founders worked at Kensho, an AI tech startup for financial data. They had personally spent significant amounts of time setting up and managing Kubernetes clusters to support their data-heavy workloads—so when they started working on Suno, they knew exactly what they did not want:

  • They did not want to manage their own clusters. They knew this would only become more complex over time in order to handle scaling, redundancy, and load balancing.
  • They did not want to divert engineering resources and delay time-to-market in a rapidly evolving industry.
  • They did not want to commit to 3-year-long GPU reservations to secure reasonable prices.

Georg, co-founder and CTO of Suno, gave Modal a try after a friend’s recommendation. He was intrigued by how easy it was to deploy code in the cloud.

An easy setup

Suno began by running their batch pre-processing on Modal, allowing Modal to dynamically manage the compute needed by these workflows. Not a single config file was used—all they needed was a few short Python scripts running in Modal:

Modal reminded me of the difference between PyTorch and TensorFlow, where Torch catered more to the ML crowd and was okay deviating from some CS principles. That’s the beauty of Modal. You don’t have to understand much about containers; all you need to know is that you can scale your function calls in the cloud with a few lines of Python.
— Georg Kucsko, Co-founder and CTO, Suno

Suno then expanded their use of Modal to model deployment. As a general purpose platform, Modal offered many features that Suno could leverage, like the ability to:

  • Expose functions directly as web endpoints
  • Chain together inputs and outputs of inference functions to create end-to-end sequences across multiple models and containers

…all defined programmatically in Python.

The Modal team worked closely with Suno as they transitioned from prototypes to production. Georg remarked, “It’s almost like we’re on the same team; us flagging something and you guys immediately working on it is awesome.”

(Auto)scaling to 1000 GPUs

Suno GPU usage chart

Suno’s GPU usage on Modal is variable and peaks on holidays

As Suno’s popularity grew, the feature they found most valuable was Modal’s ability to auto-scale up or down thousands of GPUs to efficiently match demand. During holidays like Christmas and Valentine’s Day, request volume would shoot up as users created more songs to share with friends and family.

What kills you is this peak demand, right? Like you just can’t afford to be buying machines for steady demand and then also have two people for six months do nothing other than building inference that can handle scaling down and up from that.
— Georg Kucsko

Aside from saving developer time, Suno also did not need to commit financially to a large amount of GPUs, with the challenges that this typically entails—either low utilization or a degraded user experience.

Modal looks forward to supporting Suno as their compute needs grow!

p.s. check out this theme song we made with Suno!

Ship your first app in minutes.

Get Started

$30 / month free compute