Sandboxes now GA, run LLM-generated code at scale! Learn more
January 21, 20255 minute read
How to deploy Whisper Large V3 on Modal
author
Yiren Lu@YirenLu
Solutions Engineer

What is Whisper Large V3?

Whisper Large V3 is OpenAI’s most advanced open-source speech-to-text transcription model. It was trained on 1 million+ hours of diverse audio data and is 1.54 billion parameters. Our sample code runs the large-v3-turbo version, which is an efficiency-optimized variant that maintains high accuracy.

Why should you run Whisper Large V3 on Modal?

Modal is the best and easiest way to access a GPU for running transformer models like Whisper Large V3.

Modal also makes it easy to run Whisper in parallel across many different containers. This can be super useful for batch processing large audio datasets, where you might want to transcribe hundreds of audio files in parallel.

Example code for running Whisper Large V3 on Modal

To run the following code, you will need to:

  1. Create an account at modal.com
  2. Run pip install modal to install the modal Python package
  3. Run modal setup to authenticate (if this doesn’t work, try python -m modal setup)
  4. Save the implementation code in app.py
  5. Execute with modal run app.py
import modal

image = (
    modal.Image.debian_slim()
    .apt_install("ffmpeg")
    .pip_install("openai-whisper", "ffmpeg-python")
)
app = modal.App("example-base-whisper-large-v3-turbo", image=image)

GPU_CONFIG = modal.gpu.H100(count=1)

CACHE_DIR = "/cache"
cache_vol = modal.Volume.from_name("whisper-cache", create_if_missing=True)

@app.cls(
    gpu=GPU_CONFIG,
    volumes={CACHE_DIR: cache_vol},
    allow_concurrent_inputs=15,
    container_idle_timeout=60 * 10,
    timeout=60 * 60,
)
class Model:
    @modal.enter()
    def setup(self):
        import whisper

        self.model = whisper.load_model("large-v3", device="cuda", download_root=CACHE_DIR)

    @modal.method()
    def transcribe(self, audio_url: str):
        import requests

        response = requests.get(audio_url)

        # Save the audio file locally
        with open("downloaded_audio.wav", "wb") as audio_file:
            audio_file.write(response.content)

        result = self.model.transcribe("downloaded_audio.wav")
        return result["text"]


# ## Run the model
@app.local_entrypoint()
def main():
    url = "https://pub-ebe9e51393584bf5b5bea84a67b343c2.r2.dev/examples_english_english.wav"
    print(Model().transcribe.remote(url))

Ship your first app in minutes.

Get Started

$30 / month free compute