Startups get up to $50k in free compute credits.
April 17, 20255 minute read
Product updates: Running batch jobs with 1M inputs, ephemeral apps, and a new TensorRT-LLM example

🍩 Run async jobs with 1M inputs

Running large-scale async jobs on Modal just got a whole lot easier:

  • You can now queue up to 1 million inputs per Modal Function (previously 2k).
  • We’ve also raised the .spawn() rate limit so you can submit inputs more quickly.
  • FunctionCall results now stick around for 7 days, giving you more flexibility to retrieve them when you’re ready.

Want to try job processing on Modal? Check out the guide →

👩‍💻 Client updates

Run pip install --upgrade modal to get the latest client updates.

  • Modal Client v1.0 is on the way! Expect cleaner APIs and some deprecation warnings — check out our Migration Guide to prep your code.
  • You can now launch ephemeral apps from within containers using with app.run():. Avoid putting this in global scope to prevent recursion.
  • Use context_dir to make relative COPY commands in Dockerfiles work more reliably.
  • Use Image.cmd(...) to define default entrypoint args for your Docker images.
  • You can now see Git commit info for apps, both in the CLI via modal app history, and in the dashboard.

🖊️New super fast LLM inference example with TensorRT-LLM

Check out our new example showing how to serve large language models with ultra-low (less than 400 ms) latency using TensorRT-LLM on Modal. Perfect for real-time applications.

📽️ Video walkthroughs

Want to see Modal in action? We dropped two new walkthroughs:

  • Deploy DeepSeek models on Modal — A step-by-step guide to spinning up DeepSeek in production. Watch the video →
  • Serve OpenAI-compatible APIs with vLLM — Learn how to deploy and scale a blazing-fast vLLM service on Modal. Watch the video →

🚀 Customer launches

  • Imbue launched Sculptor, the first coding agent environment that helps you catch issues, write tests, and improve your code, built on Modal Sandboxes.
  • Phonic launched their new voice AI platform, with Modal enabling low-latency inference and massively parallel job processing.
  • Firebender launched Kotlin-bench, the first benchmark evaluating AI models on real-world Kotlin & Android tasks, using Modal’s .map() for large-scale parallelization.

🍭 Fun tidbits

  • We had some amazing demos at our open-source LLM demo night (hosted jointly with Mistral), from blazing fast speech-to-speech to domain-specific agent evals.

  • We launched our first billboard campaign in SF! Anyone who finds and tweets a photo of our billboards gets a little prize.

Modal CTA

Ship your first app in minutes.

Get Started

$30 / month free compute