How to deploy Whisper Large V3 on Modal

Solutions Engineer

What is Whisper Large V3?

Whisper Large V3 is OpenAI’s most advanced open-source speech-to-text transcription model. It was trained on 1 million+ hours of diverse audio data and is 1.54 billion parameters. Our sample code runs the large-v3-turbo version, which is an efficiency-optimized variant that maintains high accuracy.

Modal is the best and easiest way to access a GPU for running transformer models like Whisper Large V3.

Modal also makes it easy to run Whisper in parallel across many different containers. This can be super useful for batch processing large audio datasets, where you might want to transcribe hundreds of audio files in parallel.

To run the following code, you will need to:

Create an account at modal.com
Run pip install modal to install the modal Python package
Run modal setup to authenticate (if this doesn’t work, try python -m modal setup)
Save the implementation code in app.py
Execute with modal run app.py

import modal

image = (
    modal.Image.debian_slim()
    .apt_install("ffmpeg")
    .pip_install("openai-whisper", "ffmpeg-python")
)
app = modal.App("example-base-whisper-large-v3-turbo", image=image)

GPU_CONFIG = "H100"

CACHE_DIR = "/cache"
cache_vol = modal.Volume.from_name("whisper-cache", create_if_missing=True)

@app.cls(
    gpu=GPU_CONFIG,
    volumes={CACHE_DIR: cache_vol},
    scaledown_window=60 * 10,
    timeout=60 * 60,
)
@modal.concurrent(max_inputs=15)
class Model:
    @modal.enter()
    def setup(self):
        import whisper

        self.model = whisper.load_model("large-v3", device="cuda", download_root=CACHE_DIR)

    @modal.method()
    def transcribe(self, audio_url: str):
        import requests

        response = requests.get(audio_url)

        # Save the audio file locally
        with open("downloaded_audio.wav", "wb") as audio_file:
            audio_file.write(response.content)

        result = self.model.transcribe("downloaded_audio.wav")
        return result["text"]


# ## Run the model
@app.local_entrypoint()
def main():
    url = "https://pub-ebe9e51393584bf5b5bea84a67b343c2.r2.dev/examples_english_english.wav"
    print(Model().transcribe.remote(url))

How to speed up Whisper
All the open-source Whisper variants
Run WhisperX on Modal - WhisperX is a faster Whisper fork with some extra features.

How to deploy Whisper Large V3 on Modal

What is Whisper Large V3?

Why should you run Whisper Large V3 on Modal?

Example code for running Whisper Large V3 on Modal

Related resources

Ship your first app in minutes.