Sandboxes are the Modal primitive for safely running untrusted code, whether that code comes from LLMs, users, or other third-party sources. We’ve been honing Sandboxes in beta for the past year, and today we’re excited to announce they’re generally available!
Why we built Sandboxes
We built Modal Functions to run code written by you, the user. Your Functions can interact with your Modal workspace - they can mount Secrets, create Volumes, call other Functions, and more. This model works because you know you can trust the code you deploy directly.
But agentic systems need to execute code without human supervision. Your agent may make a destructive mistake, or a malicious user may prompt your agent in a dangerous direction! In either case, you can’t trust an LLM with your resources the same way that you can trust yourself. LLM-generated code should run in an isolated environment where its blast radius is limited.
These concerns extend to your users as well. When executing user-written code, you need to ensure that an attacker can’t damage your environment or extract sensitive data.
We built Sandboxes to solve for these concerns. Sandboxes give you a dynamic environment to run code in an arbitrary language, safely isolated from the rest of your Modal resources.
Enough talk, let’s see the code
Sandboxes provide a simple exec
API for executing code:
import modal
app = modal.App.lookup("sandbox-manager", create_if_missing=True)
sb = modal.Sandbox.create(app=app)
p = sb.exec("python", "-c", "print('hello')")
print(p.stdout.read())
sb.terminate()
LLMs may specify dependencies or need to execute code in other languages. Sandboxes let you configure the execution environment at runtime, using the same Image API and infrastructure as Functions:
# Get requested dependencies from LLM and use them to
# dynamically build the Sandbox image.
llm_output = '{ "requested_packages": ["nodejs", "php"] }'
packages = json.loads(llm_output)["requested_packages"]
image = modal.Image.debian_slim().apt_install(*packages)
# Test that our languages work!
sb = modal.Sandbox.create(image=image, app=app)
p = sb.exec("node", "-e", 'console.log("hello from nodejs")')
print(p.stdout.read())
p = sb.exec("php", "-r", "echo 'hello from php';")
print(p.stdout.read())
You can even snapshot your filesystem for persistence and to fan out search over many Sandboxes:
sb = modal.Sandbox.create(app=app)
sb.exec("bash", "-c", "echo 'data_file' > /data").wait()
snap = sb.snapshot_filesystem()
# These sandboxes will all have /data present and can fan out to
# run tests over many different states
sb2 = modal.Sandbox.create(image=snap, app=app)
sb3 = modal.Sandbox.create(image=snap, app=app)
p2 = sb2.exec("pytest", "tests/unit")
p3 = sb3.exec("pytest", "tests/integration")
print(p2.stdout.read())
print(p3.stdout.read())
This is just a taste of the Sandbox feature set. Check out the Sandbox docs for details on how to forward ports, restrict network access, access files and more.
Why use Modal Sandboxes
Sandboxes run on the same underlying infrastructure as Functions, meaning you get all the benefits you’re used to with Modal Functions. This means blazing fast cold starts, access to the latest GPUs, global region selection, and more are all available in Sandboxes. As we make our core platform more powerful and reliable, those improvements will play out in both Functions and Sandboxes.
The tight integration in our platform also means it’s simple to quickly build features that use both Sandboxes and Functions!
Customer stories
We’re proud of the applications our customers are building across a variety of use cases that require secure and scalable code execution.
Accelerating agent benchmarks with SWE-bench
SWE-bench is the highest profile benchmark for testing coding agents. It runs LLM-generated code against actual GitHub pull requests to measure model performance at fixing bugs, implementing features, and more.
We’ve upstreamed Modal support into SWE-bench for blazing fast evaluations. By adding a simple --modal
flag to their run command, researchers can now:
- Run evaluations entirely in the cloud without any infrastructure setup
- Execute tests in parallel across hundreds of containers
- Run the Verified benchmark (500 tasks) in only 7 minutes
Modal’s built-in image caching follows the same layered approach as SWE-bench’s existing Docker images, allowing for a simple integration process. These evaluation runs could take hours previously; the Modal integration enables a much tighter feedback loop.
Secure code execution at Quora
Quora uses Modal Sandboxes to power the code execution in Poe, their AI chat platform. When you ask Poe’s AI-powered bots to write and run code, that code executes safely in Modal Sandboxes. They’re completely isolated, meaning the code is kept separate from both the main Quora infrastructure and any other user’s code.
This integration allows Poe to offer interactive coding features while still maintaining strict security. You can experiment with code that the AI suggests without worrying about damaging the platform or exposing any sensitive data.
We’ve done extensive performance testing to make sure we’re future-proofed at any scale that Quora may burst up to. We’ve tested Sandbox creation throughput up to 1000 Sandboxes per second - if you need to rapidly scale out code execution, let us know!
Large-scale refactors with Codegen
Codegen is building an AI system for performing large-scale codebase refactors. Their approach involves building a massive in-memory index of the target codebase and giving AI models the ability to execute “codemods” - automated code transformations that implement the desired changes. For example, our own modal-client repository looks like this in Codegen:
Modal Sandboxes provide Codegen with two critical capabilities:
- A reliable environment for building and maintaining their in-memory codebase representations
- A secure execution environment for running AI-generated codemods with strict isolation
This combination of performance and security enables Codegen to confidently apply AI-driven refactoring at scale.
AI workforce automation with Relevance AI
Relevance AI is building an AI workforce platform that uses agents to automate complex tasks. They leverage Modal Sandboxes in two key ways:
- Providing a secure environment for their AI agents to run dynamically generated code
- Powering their notebook/builder feature where users can write and execute code in a serverless environment
Modal Sandboxes were a perfect fit for Relevance AI because they offer:
- Flexibility to install any package on demand
- Full customization of runtime commands
- Fast cold-boot times for responsive execution
- Support for any programming language their agents need
This combination lets Relevance AI’s agents tackle a wide range of automation tasks while maintaining strict security boundaries between executions.
Get started today
Modal Sandboxes are available to all users. Whether you’re building an AI coding assistant, running untrusted user code, or just need a secure environment for code execution, Sandboxes provide the tools you need.
To get started:
- Install Modal:
pip install modal
- Create an account:
python -m modal setup
- Check out our Sandbox documentation
We can’t wait to see what you build!