Modal Code Playground

Modal functions and applications run within isolated, lightweight virtual machines called containers.

In this tutorial, we’ll build on our GPU acceleration tutorial and show you how to define a custom container image to run more interesting functions with your favorite ML frameworks, culminating in running a small open-weights transcription model.

Set up a custom container image

To run an audio transcription model, we’ll need to install transformers and ffmpeg in our GPU-enabled Modal container.

Here we define a modal.Image using the recommended debian_slim image as our base. We add the necessary libraries to our container image with the .pip_install and .apt_install methods:

image = modal.Image.debian_slim()  # start from basic Linux image
image = image.pip_install("transformers[torch]")  # add neural network libraries
image = image.apt_install("ffmpeg")  # add system library for audio processing

We then pass in this image to our GPU-accelerated function check_cuda:

@app.function(gpu="A10G", image=image)  
def check_cuda():
    ...

Note that this means we don’t have to install these packages locally on our development machines. We also don’t have to install any GPU drivers, since all Modal containers already include the lower parts of the CUDA stack (learn more in our CUDA guide).

Hit Run. You’ll notice the first time you run this function, the output first shows the progress of building the custom image, after which your function runs and prints its CUDA status. The image is cached, so the function should run immediately in subsequent calls.

Your turn: Run an open-weights model

Try defining another function transcribe_audio that uses the same image to run a small speech-to-text model with transformers:

@app.function(gpu="A10G", image=image)
def transcribe_audio(file_url: str):
    from transformers import pipeline
    transcriber = pipeline(model="openai/whisper-tiny.en", device="cuda")
    result = transcriber(file_url)
    print(result["text"])

Now call transcribe_audio from your local entrypoint and run this script again. Here we pass in an audio clip of MLK’s “I Have a Dream” speech:

@app.local_entrypoint()
def main():
    check_cuda.remote()
    transcribe_audio.remote( 
        "https://modal-public-assets.s3.amazonaws.com/mlk.flac"
    )  # I have a dream ...

This time, the function should start running immediately.

And now you know the basics of running any custom code on Modal! To see applications of running larger open-weights models on Modal, check out our Stable Diffusion and vLLM inference examples. You’ll see a lot of other optimization tricks you can do, like baking model weights into your modal.Image at build time to make cold starts super fast.


$ modal run custom_container.py