Product Updates: RTX Pro 6000 Blackwell, Command K, Sandbox FS API and more

⚡RTX Pro 6000 Blackwell is now available on Modal

NVIDIA's RTX Pro 6000 Blackwell is now available on Modal with a single line of code. With 96GB of VRAM and strong fp4/fp8 throughput, it's great for inference workloads, fine-tuning runs, and anything that benefits from large memory headroom.

Learn more about specifying your GPU type.

⌨️ Command K now in the dashboard

Hit CMD+K (or Ctrl+k on Windows) anywhere in the Modal dashboard to open the new Command Palette. This first release includes basic navigation shortcuts and the ability to jump directly to any Modal object page by pasting its Modal ID. You can also jump straight to the object from the CLI by using modal dashboard <object-id>.

📁 Sandbox Filesystem API now in Beta

We’ve overhauled the sandbox Filesystem API to improve reliability and stability over the prior Alpha version. The FS API is the easiest way to move files in and out of sandboxes. It supports reading files up to 5GB, writing files of any size, streaming in both directions, and syncing data to volumes V2:

import modal

app = modal.App.lookup("sandbox-fs-demo", create_if_missing=True)

sb = modal.Sandbox.create(app=app)

script = """\
with open("/tmp/hello_world.txt", "w") as f:
    f.write("Hello, World!\\n")
"""

# Write the script to a file in the sandbox.
sb.filesystem.write_text(script, "/tmp/hello_world.py")

# Execute the script in the sandbox.
process = sb.exec("python", "/tmp/hello_world.py")
process.wait()
print(process.stdout.read())

# Read back the file created by the script.
print(sb.filesystem.read_text("/tmp/hello_world.txt"))

sb.terminate()
sb.detach()

Read the docs →

💻 SDK Updates

Run uv pip install --upgrade modal to get the latest. Highlights from the changelog:

CLI for Modal Logs

We’ve made significant CLI enhancements so that Modal logs can be more accessible to coding agents (and humans!). modal app logs and modal container logs commands now have the ability to fetch historical logs using counting (e.g. --tail 1000) or time-based (e.g., --since 4h, --until 2026-03-15, etc.) configuration. You can also now filter by --search, --source, --function, --container, and prefix each line with its origin ID.

New deployment strategies

You can now use --strategy recreate when running modal deploy (or app.deploy(strategy="recreate")) to immediately terminate running containers when a deployment completes, guaranteeing all subsequent inputs hit the new version instead of waiting for a graceful rollover. This is useful for dev workflows and for Apps running at max_containers. modal serve now uses this strategy automatically during code updates. The default rolling strategy is unchanged.

📖 Content Roundup

Deploy Gemma 4

Learn how to deploy Google's Gemma 4 on Modal. We published a detailed walkthrough for the 26B-A4B variant, a multimodal, reasoning-capable MoE model that punches way above its weight. The example covers the full setup: caching weights with Modal Volumes, configuring vLLM, and wiring up tool use and reasoning parsing for Gemma 4.

Check out the example →

Powering real-time inference at Runway

Runway Characters is a real-time video agent API built on GWM-1 that lets developers create expressive conversational characters from a single image with zero fine-tuning. Runway's team went from proof of concept to production on Modal in under 30 days.

Read the post →

Building the agentic dev stack on Modal

Imbue is building an agentic dev stack on Modal. Mngr orchestrates 100s of isolated AI coding agents across Modal sandboxes. Attach to any one live mid-task, auto-shutdown when idle. Keystone generates working Dockerfiles for any repo by running Claude Code safely in a Modal sandbox. Offload fans out test suites across up to 200 Modal sandboxes, seeing 6x speedups on real test suites.

How Doppel eliminated ML infrastructure tax with Modal

Doppel migrated their ML workflows to Modal and cut build times by up to 10× with image layer caching and persistent volumes for model weights. Their inference workloads now auto-scale to absorb traffic spikes, no manual intervention needed.

Read the post →