BoomBot: Create your own music samples on Discord
(quick links: try it out on Discord; watch demo with audio; view source code)
MusicGen is the latest milestone language model in conditional music generation, with great results. We wanted a space to easily play around with the model and share our creations, so we created a Discord community featuring BoomBot, an on-demand music sample generator.
You can call BoomBot in the Discord server by simply typing /generate
, then
prompting it with a text description of the music you’d like to create, and even
a file of the melody you’d like to condition on, along with other specifiable
parameters.
Everything, from the backend API to our React frontend, is deployed serverlessly using Modal, and the code is available here.
Code overview
BoomBot runs the MusicGen model remotely in a GPU-accelerated container — when a
user interacts with the Discord bot using the slash command /generate
, Discord
sends a webhook call to Modal, which then generates a music sample using the
text description and other user inputs from the command. We also deploy a React
single page application as static files into a Modal web endpoint for our web
app.
We go into detail into each of these steps below, and provide commands for running each of them individually. To follow along, clone the repo and set up a Discord token for yourself.
Language model
We use Audiocraft, a PyTorch
library that provides the code and models for MusicGen, and load both the 3.3B
large
and 1.5B melody
models to use depending on the user input (large for
just text, melody for text and melody). We can install all our dependencies and
“bake” both models into our image to avoid downloading our models during
inference and take advantage of Modal’s incredibly fast cold-start times:
def download_models():
from audiocraft.models import MusicGen
MusicGen.get_pretrained("large")
MusicGen.get_pretrained("melody")
image = (
modal.Image.debian_slim()
.apt_install("git", "ffmpeg")
.pip_install(
"pynacl", # for Discord authentication
"torch",
"soundfile",
"pydub",
"git+https://github.com/facebookresearch/audiocraft.git",
)
.run_function(download_models, gpu="any")
)
app.image = image
We then write our model code within Modal’s
@app.cls
decorator, with the
generate
function processing the user input and generating audio as bytes that we can
save to a file later.
It’s useful to test out our model directly before we build out our backend API,
so we define a
local_entrypoint
. We can then
run our model from the CLI using inputs of our choice:
modal run main.py --prompt “soothing jazz” --duration 20 --format "mp3" --melody "https://cdn.discordapp.com/ephemeral-attachments/1124040605304094870/1124371596484804880/bach.mp3"
Discord bot
Now that we’ve loaded our model and wrote our function, we’d like to trigger it
from Discord. We can do this using
slash commands
— a feature that lets you register a text command on Discord that triggers a
custom webhook when a user interacts with it. We handle all our Discord events
in a FastAPI app in
bot.py
. Luckily,
we can deploy this app easily and serverlessly using Modal’s
@asgi_app decorator.
Create a Discord app
To connect our model to a Discord bot, we’re first going to create an application on the Discord Developer Portal.
- Go to the Discord Developer Portal and login with your Discord account.
- On the portal, go to Applications and create a new application by clicking New Application in the top right next to your profile picture.
- Copy your Discord application’s Public Key (in General Information)
and create a custom Modal secret for it. On Modal’s
secret creation page, paste the public key as the secret value with the key
DISCORD_PUBLIC_KEY
, then name this secretboombot-discord-secret
.
Then go back to your application on the Discord Developer Portal, as we need to do a few more things to finish setting up our bot.
Register a Slash Command
Next, we’re going to register a command for our Discord app via an HTTP endpoint.
Run the following command in your terminal, replacing the appropriate variable
inputs. BOT_TOKEN
can be found by resetting your token in the application’s
Bot section, and CLIENT_ID
is the Application ID available in
General Information.
BOT_TOKEN='replace_with_bot_token'
CLIENT_ID='replace_with_application_id'
curl -X POST \
-H 'Content-Type: application/json' \
-H "Authorization: Bot $BOT_TOKEN" \
-d '{
"name":"generate",
"description":"generate music",
"options":[
{
"name":"prompt",
"description":"Describe the music you want to generate",
"type":3,
"required":true
},
{
"name":"duration",
"description":"Duration of clip, in seconds (max 60)",
"type":4,
"required":false,
"min_value":2,
"max_value":60
},
{
"name":"melody",
"description":"File of melody you'\''d like to condition (takes first 30 secs if longer)",
"type":11,
"required":false
},
{
"name":"format",
"description":"Desired format of output (default .wav)",
"type":3,
"required":false,
"choices":[
{
"name": ".wav",
"value": "wav"
},
{
"name": ".mp3",
"value": "mp3"
}
]
}
]
}' "https://discord.com/api/v10/applications/$CLIENT_ID/commands"
This will register a Slash Command for your bot named generate
, and has a few
parameters like prompt
, duration
, and format
. More information about the
command structure can be found in the Discord docs
here.
Deploy a Modal web endpoint
We then create a
POST /generate
endpoint in
bot.py
using FastAPI and Modal’s
@asgi_app decorator to handle
interactions with our Discord app (so that every time a user does a slash
command, we can respond to it).
Note that since Discord requires an interaction response within 3 seconds, we
use spawn
to kick off
Audiocraft.generate
as a background task while returning a defer
message to
Discord within the time limit. We then update our response with the results once
the model has finished running.
You can deploy this app by running the following command from your root directory:
modal deploy src.bot
Copy the Modal URL that is printed in the output and go back to your
application’s General Information section on the
Discord Developer Portal. Paste
the URL, making sure to append the path of your POST
endpoint, in the
Interactions Endpoint URL field, then click Save Changes. If your
endpoint is valid, it will properly save and you can start receiving
interactions via this web endpoint.
Finish setting up Discord bot
To start using the Slash Command you just set up, you need to invite the bot to
a Discord server. To do so, go to your application’s OAuth2 section on the
Discord Developer Portal. Select
applications.commands
as the scope of your bot and copy the invite URL that is
generated at the bottom of the page.
Paste this URL in your browser, then select your desired server (create a new
server if needed) and click Authorize. Now you can open your Discord server
and type /{name of your slash command}
- your bot should be connected and
ready for you to use!
React frontend
We also added a simple React frontend to our FastAPI app to serve as a landing page for our Discord server.
We added the frontend to our existing app in
bot.py
by simply using Mount
to mount our local directory at /assets
in our
container, then instructing FastAPI to serve this
static file directory at
our root path.
For a more in-depth tutorial on deploying your web app on Modal, refer to our Document OCR web app example.
Run this app on Modal
All the source code for this example can be found on Github.
If you’re interested in learning more about Modal, check out our docs and other examples.