Infrastructure

Best Code Execution Sandbox for Hermes in 2026

Hermes Agent requires a secure, scalable environment to execute AI-generated code safely. As autonomous agents write and run code independently, selecting the right sandbox infrastructure determines whether your Hermes deployment can handle untrusted code execution, scale to meet demand, and maintain the isolation necessary for production workloads.

Modal TeamEngineering
June 202620 min read
Best Code Execution Sandbox for Hermes

This guide examines seven code execution sandboxes for Hermes in 2026, starting with Modal, the serverless compute platform with native Hermes backend support and proven scale at over 1 billion sandboxes executed to date.

Key Takeaways

  • Native Hermes backend support matters most: Among the managed cloud sandbox providers covered here, Modal and Daytona are the only ones documented by Hermes as cloud/serverless `terminal.backend` options; Hermes also supports local, Docker, SSH, and Singularity/Apptainer backends
  • Isolation technology varies significantly: Modal uses gVisor containers, E2B employs Firecracker microVMs, and Cloudflare Sandbox runs each sandbox in an isolated Linux container, each with different security and performance characteristics
  • Session limits affect long-running workflows: Modal Sandboxes support configurable timeouts up to 24 hours, Daytona auto-stops idle sandboxes after 15 minutes by default but can be configured for indefinite runtime, E2B runs up to 24 hours on Pro plans and 1 hour on Base plans with pause/resume for longer workflows, and Cloudflare Sandbox sleeps after 10 minutes of inactivity by default and can be kept alive with `keepAlive: true`
  • GPU support enables ML-enhanced agents: Modal provides on-demand access to H100, H200, and B200 GPUs, allowing Hermes agents to call upon acceleration when workloads require it
  • Production scale requires proven infrastructure: Modal powers over 10,000 teams including Ramp and Lovable, demonstrating enterprise-grade reliability for agent sandboxes

1. Modal

Modal delivers serverless compute with native Hermes backend support, gVisor-based isolation, and on-demand GPU access. The platform is configured as `terminal.backend: modal` in Hermes with `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` authentication, making it the most straightforward option for Hermes deployments requiring secure code execution at scale.

Core Capabilities

  • Native Hermes integration: Official backend support with simple environment variable authentication
  • gVisor container isolation: Secure sandboxed execution for running AI-generated code with compute jobs containerized and virtualized
  • Code-first SDK with all-language support: Sandboxes can run whatever runtime or language the workload requires, and Modal provides code-defined infrastructure through SDKs in Python, TypeScript, and Go
  • Massive concurrency: Support for 100k+ concurrent sandboxes with sub-second scheduling and strong cold-start performance on custom images
  • Fast cold starts: Engineered for fast cold starts and faster feedback loops, with an optimized filesystem that helps containers come online quickly without letting large images slow startup down
  • On-demand GPU access: Hermes agents can call upon H100, H200, B200 and A100 variants when workloads require acceleration
  • Filesystem snapshots: State preservation across sessions through sandbox snapshots

Security and Compliance

Modal has completed a SOC 2 Type II audit and is SOC 2 Type II compliant, and supports HIPAA-compliant workloads on Enterprise plans via a BAA. The platform uses gVisor-based sandboxing for compute isolation, TLS 1.3 for public APIs, and encryption for data in transit and at rest.

Production-Proven Results

Modal powers production workloads for AI companies running agent infrastructure:

  • Lovable runs "tens of thousands of app creation sessions in an instant" using Modal Sandboxes
  • Ramp uses Modal to power background coding agents that generate code changes
  • The platform has executed over 1 billion sandboxes to date

Best For: Teams running Hermes agents that need native backend support, configurable session timeouts up to 24 hours with snapshot-based continuation, GPU access for ML workloads, and production-proven scale.

2. Daytona

Daytona provides persistent development environments that support cold starts and native Hermes backend support. The platform is configured as `terminal.backend: daytona` with `DAYTONA_API_KEY` authentication.

Core Capabilities

  • Native Hermes integration: Official backend support with sandboxes following the `hermes-{task_id}` naming pattern
  • Cold starts: Daytona advertises sandbox creation and startup from code to execution
  • Persistent state model: Sandboxes stop and resume instead of being deleted, preserving context across sessions
  • Isolated environments: Daytona provides isolated sandbox environments with persistent state and configurable resource limits
  • GPU support: Available for ML workloads alongside persistent storage

Architecture Approach

Daytona focuses on workspace continuity rather than ephemeral execution. Sandboxes auto-stop after 15 minutes of inactivity by default but can be configured for indefinite runtime by disabling auto-stop, benefiting Hermes agents that need to preserve cached dependencies or intermediate results. Persistent filesystem state should not be conflated with uninterrupted live-process execution.

Best For: Teams building Hermes agents that require persistent development environments, cold starts, and workspace continuity across sessions.

3. E2B

E2B specializes in secure sandboxes for AI agents using Firecracker microVM isolation. The platform claims adoption by 94% of Fortune 100 companies for frontier agentic workflows, with customers including Perplexity, Hugging Face, and Groq.

Core Capabilities

  • Firecracker microVMs: Hardware-level isolation providing dedicated kernel per workload
  • Firecracker-based startup: Sandbox creation with pause/resume capability
  • Code Interpreter SDK: Purpose-built for Python, TypeScript, and JavaScript with Jupyter-based execution
  • Open-source option: Self-hosting available for organizations with data sovereignty requirements
  • AutoResume feature: Automatic reconnection on network interruption

Use Case Focus

E2B excels at ephemeral code execution with strong isolation guarantees. The platform supports up to 1,100 concurrent sandboxes on higher-tier plans. Sandboxes can run continuously for up to 24 hours on Pro plans and 1 hour on Base plans; longer workflows can use pause/resume, which resets the runtime window while preserving full state.

Best For: Teams building Hermes agents that prioritize kernel-level isolation over session duration, particularly those integrating with LangChain, OpenAI, or Anthropic tooling.

4. Northflank

Northflank provides production-grade AI infrastructure with multiple isolation technology options and self-serve bring-your-own-cloud (BYOC) deployment. Northflank documents support for Firecracker, Kata Containers, Cloud Hypervisor, and gVisor, plus BYOC deployments across major clouds and on-premises environments.

Core Capabilities

  • Multiple isolation options: Choose between Firecracker, Kata Containers, Cloud Hypervisor, and gVisor per workload
  • Self-serve BYOC: Deploy to AWS, GCP, Azure, Oracle, CoreWeave, Civo, bare-metal, or on-premises infrastructure
  • SOC 2 Type 2 certified: Production compliance certification for enterprise deployments
  • GPU support: H100 access alongside sandboxes for ML workloads
  • Full platform integration: Databases, APIs, workers, and GPUs alongside sandbox capabilities

Architecture Approach

Northflank's unique multi-isolation support allows teams to match security requirements to specific workloads. This flexibility benefits regulated industries where different isolation models may be required for different data sensitivity levels.

Best For: Enterprise teams building Hermes agents in regulated industries that need BYOC flexibility, compliance certifications, and configurable isolation technologies.

5. Vercel Sandbox

Vercel Sandbox provides isolated code execution environments in temporary Linux microVMs powered by Firecracker. Vercel Sandbox is generally available and integrates tightly with Vercel's deployment ecosystem.

Core Capabilities

  • Firecracker microVM isolation: Each environment runs with its own filesystem, network, and process space
  • Active CPU billing: Pay only when code is actively executing, not for idle time
  • Linux environment access: Sudo, package managers, and standard command-line workflows
  • State persistence options: Automatic filesystem state saving and restoration on resume
  • Cold starts: Startup through Firecracker

Session Limits

Vercel Sandbox enforces 45-minute session limits on Hobby tier and up to 5 hours on Pro/Enterprise plans. This constraint requires Hermes workflows to be designed around session boundaries.

Best For: Teams already using Vercel's ecosystem that need isolated code execution for Hermes agents with shorter workflow durations.

6. Cloudflare Sandbox

Cloudflare Sandbox delivers code execution using Cloudflare Containers coordinated by Workers and Durable Objects, with each sandbox running in an isolated Linux container.

Core Capabilities

  • Isolated Linux containers: Each sandbox runs in a dedicated Linux container built on Cloudflare Workers, Durable Objects, and Containers
  • Workers and Containers integration: Programmatic code execution from Cloudflare's platform
  • Network placement: Sandbox placement is determined by the first request, and subsequent requests route to the same location on Cloudflare's network
  • TypeScript-first SDK: API for sandbox lifecycle management, command execution, and file operations
  • Cloudflare ecosystem integration: Works with Workers, R2, KV, and Workers AI

Session Constraints

Cloudflare Sandbox defaults to sleeping after 10 minutes of inactivity, configurable via `sleepAfter`, and can be kept alive with `keepAlive: true`; filesystem and process state are lost when the container stops. This default behavior suits Hermes agents executing short-lived tasks rather than long-running workflows.

Best For: Teams building Hermes agents optimized for latency-sensitive, short-duration tasks within the Cloudflare ecosystem.

7. Fly.io Sprites

Fly.io Sprites launched in January 2026 as persistent hardware-isolated environments specifically for AI coding agents. The platform uses Firecracker microVMs with a unique cost model that charges nothing when sandboxes are idle.

Core Capabilities

  • Firecracker microVM isolation: Hardware-level security with persistent ext4 filesystem
  • No charge when idle: Pay only for active compute while filesystem persists free
  • Checkpoint/restore: Warm Sprites resume from hibernation using copy-on-write, with durable state backed by object storage
  • Persistent filesystem: Fly.io Sprites provide a persistent ext4 filesystem backed by durable object storage, with NVMe used during active execution and cache paths, and 100GB of durable storage per Sprite. State is preserved across idle periods and resumed sessions
  • Automatic idle behavior: Compute stops automatically while storage remains available

Startup Characteristics

Fly positions new Sprite creation as a core capability. Warm Sprites resume from hibernation, making the platform suitable for intermittent Hermes workflows with natural pauses.

Best For: Teams building Hermes agents with intermittent execution patterns that benefit from persistent storage and cost optimization during idle periods.

Why Modal Stands Out for Hermes Agent Infrastructure

Native Hermes Backend Support

Among the managed cloud sandbox providers covered here, Modal is one of only two platforms documented by Hermes as a cloud/serverless `terminal.backend`, configured simply through environment variables. This native integration eliminates the workarounds required when using other managed platforms, reducing setup complexity and ensuring compatibility with Hermes updates.

Proven Scale for Agent Workloads

Modal has executed over 1 billion sandboxes and powers cloud infrastructure for over 10,000 teams. This production track record demonstrates the platform's ability to handle enterprise-scale Hermes deployments. Lovable, one of Modal's customers, runs "tens of thousands of app creation sessions in an instant" using Modal's sandbox infrastructure.

Configurable Session Duration up to 24 Hours

Modal Sandboxes support configurable runtimes up to 24 hours. For workflows that need to continue beyond a single Sandbox lifetime, Modal provides Filesystem Snapshots that preserve Sandbox filesystem state indefinitely until deleted, enabling stateful continuation across subsequent Sandbox runs.

On-Demand GPU Access

Modal supports a broad GPU catalog for GPU-accelerated workloads, including T4, L4, A10, L40S, A100 variants, RTX PRO 6000, H100/H100!, H200, and B200/B200+. Hermes agents can call upon these GPUs when workloads require acceleration for ML inference, code analysis, or model-based code generation.

Massive Concurrency with Fast Scheduling

Modal's AI-native container runtime supports 100k+ concurrent sandboxes with sub-second scheduling and strong cold-start performance on custom images. The platform's custom file system, container runtime, and scheduler are optimized specifically for the elastic scaling patterns that agent workloads demand.

Enterprise Security and Compliance

With a completed SOC 2 Type II audit, HIPAA support via a BAA on Enterprise plans, and gVisor-based compute isolation, Modal meets the security requirements that production Hermes deployments demand. The platform encrypts data in transit and at rest and uses TLS 1.3 for public APIs.

For teams deploying Hermes agents that need native backend support, configurable sessions up to 24 hours with snapshot-based continuation, GPU access, and production-proven infrastructure, Modal's combination of scale, security, and AI-native architecture makes it the clear choice.

Get started with Modal Sandboxes for your Hermes deployment.

Explore the Modal Sandboxes documentation to get started.

View Sandboxes Docs

Frequently asked questions

What makes a code execution sandbox effective for AI agents like Hermes?

Effective sandboxes provide secure isolation to run untrusted AI-generated code, fast cold starts for responsive agent interactions, and scalable infrastructure to handle concurrent sessions. Modal addresses all three with gVisor isolation, sub-second scheduling, and support for 100k+ concurrent sandboxes.

How do sandboxes protect against malicious AI-generated code?

Sandboxes isolate code execution from host systems using technologies like gVisor (Modal), Firecracker microVMs (E2B, Northflank), or isolated Linux containers (Cloudflare). This isolation prevents AI-generated code from accessing unauthorized resources or affecting other workloads, critical when agents execute code autonomously.

Which sandbox platforms have native Hermes backend support?

Among the managed cloud sandbox providers covered here, Modal and Daytona are the only ones documented by Hermes as cloud/serverless `terminal.backend` options. Modal is configured with `terminal.backend: modal` using token-based authentication, while Daytona uses `terminal.backend: daytona` with API key authentication. Hermes also supports local, Docker, SSH, and Singularity/Apptainer backends.

What is the difference between gVisor and Firecracker isolation?

gVisor (used by Modal) implements a user-space kernel that intercepts system calls, providing strong isolation with container-like deployment simplicity. Firecracker (used by E2B, Vercel, Northflank, Fly.io) creates lightweight microVMs with hardware-level isolation. Both approaches protect against untrusted code execution with different performance and security characteristics.

How does session duration affect Hermes agent workflows?

Session limits determine how long an agent can run continuously. Modal Sandboxes support configurable timeouts up to 24 hours, with Filesystem Snapshots for stateful continuation beyond that boundary. Daytona auto-stops idle sandboxes after 15 minutes by default but can be configured for indefinite runtime. E2B runs up to 24 hours on Pro plans and 1 hour on Base plans, with pause/resume that resets the runtime window while preserving state. Cloudflare Sandbox defaults to sleeping after 10 minutes of inactivity and can be kept alive with `keepAlive: true`.

Can Hermes agents access GPUs through sandbox platforms?

Modal, Daytona, and Northflank publicly market GPU-capable workflows in this category. Modal supports a broad GPU catalog including H100, H200, B200 and A100 variants. Vercel Sandbox and Cloudflare Sandbox documentation focuses on CPU and container execution.

Run your first sandbox in minutes.

Get Started Free

$30 in free compute to get started.