What is Whisper Large V3?
Whisper Large V3 is OpenAI’s
most advanced open-source speech-to-text transcription model. It was trained on 1 million+ hours of diverse audio data and
is 1.54 billion parameters. Our sample code runs the large-v3-turbo
version,
which is an efficiency-optimized variant that maintains high accuracy.
Why should you run Whisper Large V3 on Modal?
Modal is the best and easiest way to access a GPU for running transformer models like Whisper Large V3.
Modal also makes it easy to run Whisper in parallel across many different containers. This can be super useful for batch processing large audio datasets, where you might want to transcribe hundreds of audio files in parallel.
Example code for running Whisper Large V3 on Modal
To run the following code, you will need to:
- Create an account at modal.com
- Run
pip install modal
to install the modal Python package - Run
modal setup
to authenticate (if this doesn’t work, trypython -m modal setup
) - Save the implementation code in
app.py
- Execute with
modal run app.py
import modal
image = (
modal.Image.debian_slim()
.apt_install("ffmpeg")
.pip_install("openai-whisper", "ffmpeg-python")
)
app = modal.App("example-base-whisper-large-v3-turbo", image=image)
GPU_CONFIG = modal.gpu.H100(count=1)
CACHE_DIR = "/cache"
cache_vol = modal.Volume.from_name("whisper-cache", create_if_missing=True)
@app.cls(
gpu=GPU_CONFIG,
volumes={CACHE_DIR: cache_vol},
allow_concurrent_inputs=15,
container_idle_timeout=60 * 10,
timeout=60 * 60,
)
class Model:
@modal.enter()
def setup(self):
import whisper
self.model = whisper.load_model("large-v3", device="cuda", download_root=CACHE_DIR)
@modal.method()
def transcribe(self, audio_url: str):
import requests
response = requests.get(audio_url)
# Save the audio file locally
with open("downloaded_audio.wav", "wb") as audio_file:
audio_file.write(response.content)
result = self.model.transcribe("downloaded_audio.wav")
return result["text"]
# ## Run the model
@app.local_entrypoint()
def main():
url = "https://pub-ebe9e51393584bf5b5bea84a67b343c2.r2.dev/examples_english_english.wav"
print(Model().transcribe.remote(url))
Related resources
- How to speed up Whisper
- All the open-source Whisper variants
- Run WhisperX on Modal - WhisperX is a faster Whisper fork with some extra features.