🍩 Run async jobs with 1M inputs
Running large-scale async jobs on Modal just got a whole lot easier:
- You can now queue up to 1 million inputs per Modal Function (previously 2k).
- We’ve also raised the
.spawn()
rate limit so you can submit inputs more quickly. FunctionCall
results now stick around for 7 days, giving you more flexibility to retrieve them when you’re ready.
Want to try job processing on Modal? Check out the guide →
👩💻 Client updates
Run pip install --upgrade modal
to get the latest client updates.
- Modal Client v1.0 is on the way! Expect cleaner APIs and some deprecation warnings — check out our Migration Guide to prep your code.
- You can now launch ephemeral apps from within containers using
with app.run():
. Avoid putting this in global scope to prevent recursion. - Use
context_dir
to make relativeCOPY
commands in Dockerfiles work more reliably. - Use
Image.cmd(...)
to define default entrypoint args for your Docker images. - You can now see Git commit info for apps, both in the CLI via
modal app history
, and in the dashboard.
🖊️New super fast LLM inference example with TensorRT-LLM
Check out our new example showing how to serve large language models with ultra-low (less than 400 ms) latency using TensorRT-LLM on Modal. Perfect for real-time applications.
📽️ Video walkthroughs
Want to see Modal in action? We dropped two new walkthroughs:
- Deploy DeepSeek models on Modal — A step-by-step guide to spinning up DeepSeek in production. Watch the video →
- Serve OpenAI-compatible APIs with vLLM — Learn how to deploy and scale a blazing-fast vLLM service on Modal. Watch the video →
🚀 Customer launches
- Imbue launched Sculptor, the first coding agent environment that helps you catch issues, write tests, and improve your code, built on Modal Sandboxes.
- Phonic launched their new voice AI platform, with Modal enabling low-latency inference and massively parallel job processing.
- Firebender launched Kotlin-bench, the first benchmark evaluating AI models on real-world Kotlin & Android tasks, using Modal’s
.map()
for large-scale parallelization.
🍭 Fun tidbits
- We were named the #2 most promising early-stage company on the 2025 Enterprise Tech 30 list by Wing VC and Eric Newcomer.
- We had some amazing demos at our open-source LLM demo night (hosted jointly with Mistral), from blazing fast speech-to-speech to domain-specific agent evals.
- We launched our first billboard campaign in SF! Anyone who finds and tweets a photo of our billboards gets a little prize.