Suno uses Modal to scale inference and batch pre-processing to thousands of GPUs. With Modal, Suno was able to bring a state-of-the-art music generation model to market four months early instead of hiring a team of engineers to build and maintain infrastructure.
About Suno
Suno is a music generation app that can make any song you describe. Enter a simple text description—like “a deep house song about serverless infra”—and Suno makes you a song complete with vocals in seconds. Suno’s users include Grammy-winning artists, but the core user base is people experiencing making music for the first time. Microsoft recently announced they’ve partnered with Suno to bring song generation capabilities to Copilot, their AI chatbot!
Avoiding past infrastructure pain
Prior to starting Suno, all four founders worked at Kensho, an AI tech startup for financial data. They had personally spent significant amounts of time setting up and managing Kubernetes clusters to support their data-heavy workloads—so when they started working on Suno, they knew exactly what they did not want:
- They did not want to manage their own clusters. They knew this would only become more complex over time in order to handle scaling, redundancy, and load balancing.
- They did not want to divert engineering resources and delay time-to-market in a rapidly evolving industry.
- They did not want to commit to 3-year-long GPU reservations to secure reasonable prices.
Georg, co-founder and CTO of Suno, gave Modal a try after a friend’s recommendation. He was intrigued by how easy it was to deploy code in the cloud.
An easy setup
Suno began by running their batch pre-processing on Modal, allowing Modal to dynamically manage the compute needed by these workflows. Not a single config file was used—all they needed was a few short Python scripts running in Modal:
Suno then expanded their use of Modal to model deployment. As a general purpose platform, Modal offered many features that Suno could leverage, like the ability to:
- Expose functions directly as web endpoints
- Chain together inputs and outputs of inference functions to create end-to-end sequences across multiple models and containers
…all defined programmatically in Python.
The Modal team worked closely with Suno as they transitioned from prototypes to production. Georg remarked, “It’s almost like we’re on the same team; us flagging something and you guys immediately working on it is awesome.”
(Auto)scaling to 1000 GPUs
As Suno’s popularity grew, the feature they found most valuable was Modal’s ability to auto-scale up or down thousands of GPUs to efficiently match demand. During holidays like Christmas and Valentine’s Day, request volume would shoot up as users created more songs to share with friends and family.
Aside from saving developer time, Suno also did not need to commit financially to a large amount of GPUs, with the challenges that this typically entails—either low utilization or a degraded user experience.
Modal looks forward to supporting Suno as their compute needs grow!
p.s. check out this theme song we made with Suno!