Modal Blog

December 2, 2025

Modal + Mistral 3: 10x faster cold starts with GPU snapshotting

We've partnered with Mistral to bring you Day 0 support for Mistral 3, with GPU-snapshot-optimized performance.

November 20, 2025

Agents need good developer experience too

Turns out, good devex for agents looks a lot like good devex for humans.

November 19, 2025

How Reducto improved enterprise-scale document processing latency by 3x

Learn how Reducto used GPU memory snapshotting and flexible autoscaling to build fast multi-model pipelines.

Latest in Engineering

November 20, 2025

Agents need good developer experience too

Turns out, good devex for agents looks a lot like good devex for humans.

November 18, 2025

Host overhead is killing your inference efficiency

Never block the GPU.

November 4, 2025

One-Second Voice-to-Voice Latency with Modal, Pipecat, and Open Models

How we built a real-time voice bot on Modal's distributed serverless platform.

Latest in News

December 2, 2025

Modal + Mistral 3: 10x faster cold starts with GPU snapshotting

We've partnered with Mistral to bring you Day 0 support for Mistral 3, with GPU-snapshot-optimized performance.

October 31, 2025

Product Updates: Updates to Volumes, JS and Go SDKs, and more

Welcome to another round of Modal Product Updates! Here's what's new this month.

October 29, 2025

Modal + Datalab: Deploy high-throughput document intelligence in <5 minutes

We've collaborated with Datalab, the creators of Marker and Surya, to make it faster than ever to deploy document intelligence workflows.

Latest in Customer Stories

November 19, 2025

How Reducto improved enterprise-scale document processing latency by 3x

Learn how Reducto used GPU memory snapshotting and flexible autoscaling to build fast multi-model pipelines.

November 13, 2025

How Decagon shipped real-time voice AI on Modal

How Decagon and Modal made real-time voice AI possible, combining fine-tuned small models with a re-engineered inference runtime for sub-second latency.

August 28, 2025

How Zencastr transcribed hundreds of years worth of audio in just a few days

Zencastr scaled up to 1,500 concurrent GPUs on Modal to process hundreds of years of podcast audio in just a few days. Today they run transcription, speaker detection, and audio enrichment for millions of podcast episodes on Modal, giving them cost efficiency, fast iteration, and zero DevOps overhead.

Ship your first app in minutes.

Get Started

$30 / month free compute