Use Modal to build flexible pipelines to transcribe audio, generate voice or synthesize high-fidelity music.
Get startedRun Whisper on hardware of your choice, customized with pre-processing (such as ffmpeg) to your liking. Use Modal’s map capability to spin up hundreds of containers to transcribe a single audio track in parallel.
Host diffusion Models to create high-quality music samples from text or audio input. Serve your model in any form, including a serverless Discord bot.
Generate realistic human voice in real-time using open-source models. Easily combine voice generation with LLM synthesis in a single app.
Transcribe podcast episodes of any length in under two minutes.
Build a serverless Discord bot that generates music on-demand using Meta's MusicGen model.
Build a complete chat app with Whisper transcription, Vicuna LLM and Tortoise text-to-speech.
“Substack recently launched a feature for AI-powered audio transcriptions. The data team picked Modal because it makes it easy to write code that runs on 100s of GPUs in parallel, transcribing podcasts in a fraction of the time.”
“Suno has developed proprietary state-of-the-art models that generate music and speech using AI. We chose Modal as our infrastructure provider for inference and parallel data processing. Modal's superb developer experience enables our team to ship new models to production quickly, and with and confidence we'll scale to thousands of simultaneous users.”