Modal functions and applications run within isolated, lightweight virtual machines called containers.
In this tutorial, we’ll build on our GPU acceleration tutorial and show you how to define a custom container image to run more interesting functions with your favorite ML frameworks, culminating in running a small open-weights transcription model.
Set up a custom container image
To run an audio transcription model, we’ll need to install transformers
and ffmpeg
in our GPU-enabled Modal container.
Here we define a modal.Image
using the recommended debian_slim
image as our base. We add the necessary libraries to our container image with the .pip_install
and .apt_install
methods:
image = modal.Image.debian_slim() # start from basic Linux image
image = image.pip_install("transformers[torch]") # add neural network libraries
image = image.apt_install("ffmpeg") # add system library for audio processing
We then pass in this image to our GPU-accelerated function check_cuda
:
@app.function(gpu="A10G", image=image)
def check_cuda():
...
Note that this means we don’t have to install these packages locally on our development machines. We also don’t have to install any GPU drivers, since all Modal containers already include the lower parts of the CUDA stack (learn more in our CUDA guide).
Hit Run. You’ll notice the first time you run this function, the output first shows the progress of building the custom image, after which your function runs and print
s its CUDA status. The image is cached, so the function should run immediately in subsequent calls.
Your turn: Run an open-weights model
Try defining another function transcribe_audio
that uses the same image to run a small speech-to-text model with transformers
:
@app.function(gpu="A10G", image=image)
def transcribe_audio(file_url: str):
from transformers import pipeline
transcriber = pipeline(model="openai/whisper-tiny.en", device="cuda")
result = transcriber(file_url)
print(result["text"])
Now call transcribe_audio
from your local entrypoint and run this script again. Here we pass in an audio clip of MLK’s “I Have a Dream” speech:
@app.local_entrypoint()
def main():
check_cuda.remote()
transcribe_audio.remote(
"https://modal-cdn.com/mlk.flac"
) # I have a dream ...
This time, the function should start running immediately.
And now you know the basics of running any custom code on Modal! To see applications of running larger open-weights models on Modal, check out our Stable Diffusion and vLLM inference examples.