Learn how Reducto used GPU memory snapshotting and flexible autoscaling to build fast multi-model pipelines.
Never block the GPU.
How Decagon and Modal made real-time voice AI possible, combining fine-tuned small models with a re-engineered inference runtime for sub-second latency.
Never block the GPU.
How we built a real-time voice bot on Modal's distributed serverless platform.
Asynchrony, fast approximate exponents, and 10x more efficient softmax.
Welcome to another round of Modal Product Updates! Here's what's new this month.
We've collaborated with Datalab, the creators of Marker and Surya, to make it faster than ever to deploy document intelligence workflows.
We’re excited to announce that we have raised more than $80M in a Series B round, led by Lux Capital. Our post-money valuation is $1.1B.
Learn how Reducto used GPU memory snapshotting and flexible autoscaling to build fast multi-model pipelines.
How Decagon and Modal made real-time voice AI possible, combining fine-tuned small models with a re-engineered inference runtime for sub-second latency.
Zencastr scaled up to 1,500 concurrent GPUs on Modal to process hundreds of years of podcast audio in just a few days. Today they run transcription, speaker detection, and audio enrichment for millions of podcast episodes on Modal, giving them cost efficiency, fast iteration, and zero DevOps overhead.