Product updates: Running batch jobs with 1M inputs, ephemeral apps, and a new TensorRT-LLM example

All posts

Back

News

April 17, 2025•5 minute read

🍩 Run async jobs with 1M inputs

Running large-scale async jobs on Modal just got a whole lot easier:

You can now queue up to 1 million inputs per Modal Function (previously 2k).
We’ve also raised the .spawn() rate limit so you can submit inputs more quickly.
FunctionCall results now stick around for 7 days, giving you more flexibility to retrieve them when you’re ready.

Want to try job processing on Modal? Check out the guide →

👩‍💻 Client updates

Run pip install --upgrade modal to get the latest client updates.

Modal Client v1.0 is on the way! Expect cleaner APIs and some deprecation warnings — check out our Migration Guide to prep your code.
You can now launch ephemeral apps from within containers using with app.run():. Avoid putting this in global scope to prevent recursion.
Use context_dir to make relative COPY commands in Dockerfiles work more reliably.
Use Image.cmd(...) to define default entrypoint args for your Docker images.
You can now see Git commit info for apps, both in the CLI via modal app history, and in the dashboard.

🖊️New super fast LLM inference example with TensorRT-LLM

Check out our new example showing how to serve large language models with ultra-low (less than 400 ms) latency using TensorRT-LLM on Modal. Perfect for real-time applications.

📽️ Video walkthroughs

Want to see Modal in action? We dropped two new walkthroughs:

Deploy DeepSeek models on Modal — A step-by-step guide to spinning up DeepSeek in production. Watch the video →
Serve OpenAI-compatible APIs with vLLM — Learn how to deploy and scale a blazing-fast vLLM service on Modal. Watch the video →

🚀 Customer launches

Imbue launched Sculptor, the first coding agent environment that helps you catch issues, write tests, and improve your code, built on Modal Sandboxes.
Phonic launched their new voice AI platform, with Modal enabling low-latency inference and massively parallel job processing.
Firebender launched Kotlin-bench, the first benchmark evaluating AI models on real-world Kotlin & Android tasks, using Modal’s .map() for large-scale parallelization.

🍭 Fun tidbits

We were named the #2 most promising early-stage company on the 2025 Enterprise Tech 30 list by Wing VC and Eric Newcomer.

We had some amazing demos at our open-source LLM demo night (hosted jointly with Mistral), from blazing fast speech-to-speech to domain-specific agent evals.