Infrastructure
Pydantic AI is a type-safe Python framework for building agents and LLM applications. Its Code Mode lets models write Python code that calls tools through a controlled execution layer rather than making sequential tool calls. Executing that untrusted code requires secure, isolated environments. Choosing the right code execution sandbox determines whether your Pydantic AI workflows can run generated code securely, scale without manual intervention, and access GPU acceleration when ML workloads require it.

Pydantic AI is a type-safe Python framework for building agents and LLM applications. Its Code Mode, available through Pydantic AI Harness and powered by Monty, lets models write Python code that calls tools through a controlled execution layer rather than making sequential tool calls. Code execution can reduce LLM roundtrips and improve complex multi-step tool workflows, though performance and reliability gains are workload-dependent. Regardless, executing that untrusted code requires secure, isolated environments. Veracode's 2025 GenAI Code Security Report found vulnerabilities in 45% of tested AI-generated code tasks, making sandboxed execution essential for production deployments. Choosing the right code execution sandbox determines whether your Pydantic AI workflows can run generated code securely, scale without manual intervention, and access GPU acceleration when ML workloads require it. This guide examines seven sandbox platforms serving different Pydantic AI use cases in 2026, starting with Modal, a serverless compute platform that combines secure sandboxed execution with on-demand GPU access for AI workloads.
Modal delivers serverless compute for secure code execution at scale, with on-demand GPU access layered on top for workloads requiring acceleration. The platform takes your code, containerizes it, and executes it in the cloud with automatic scaling, all defined through code using Modal's SDKs in Python, TypeScript, and Go.
idle_timeoutModal maintains SOC 2 Type II certification and supports HIPAA-compliant workloads on Enterprise plans via a BAA. The platform uses gVisor-based sandboxing for compute isolation, TLS 1.3 for public APIs, and encryption for data in transit and at rest.
Modal's code-first approach aligns naturally with Pydantic AI's type-safe ecosystem. The platform supports 50,000+ concurrent sandbox sessions, enabling Pydantic AI Harness Code Mode workflows to execute generated code at massive scale while maintaining security isolation.
Best For: Teams building Pydantic AI Harness Code Mode workflows that need secure code execution at scale, with on-demand GPU access when workloads require ML inference, model fine-tuning, or compute-intensive analysis.
E2B specializes in secure sandboxes for AI agents, focusing on ephemeral code execution with Firecracker microVM isolation. The platform markets itself as used by Fortune 100 companies for agentic workflows.
E2B excels at ephemeral code execution, spinning up isolated environments for agents to run generated code, then tearing them down. The platform supports 24-hour maximum session duration on Pro tier plans.
Best For: Teams building Pydantic AI workflows focused on code execution and testing where GPU acceleration is not required, particularly those seeking strong SDK integrations with popular AI frameworks.
Pydantic Monty is an experimental Pydantic-maintained Python interpreter that powers Pydantic AI Harness Code Mode. Rather than spinning up external containers, Monty runs within the parent process with cold starts.
Monty takes a fundamentally different approach than cloud sandboxes. As Samuel Colvin, Pydantic founder, explains: "LLMs work faster, cheaper and more reliably when they write code instead of making sequential tool calls. Monty makes that possible without the complexity of a sandbox or risk of running code directly on the host."
Monty is marked as experimental in its own GitHub README and is not yet production-ready. It runs a limited Python subset rather than full CPython. Class definitions and some standard library features are not yet supported. Check the GitHub repository for current capabilities. The project has thousands of GitHub stars and is under active development.
Best For: Teams building Pydantic AI Harness Code Mode workflows that need the fastest possible startup times and want to eliminate external infrastructure dependencies, provided the Python subset meets their requirements and the experimental maturity of the project is acceptable.
Daytona provides persistent development environments with sandbox creation support. The platform offers both open-source and managed cloud options with GPU support and configurable runtime persistence.
Daytona has announced SOC 2 Type I and HIPAA certifications for enterprise deployments. The platform includes features like Git integration, LSP support, and Docker-in-Docker capabilities.
Best For: Teams building Pydantic AI workflows that require persistent development environments and prefer workspace continuity over ephemeral execution, particularly those wanting open-source transparency.
Northflank is a full workload runtime platform offering multiple isolation options and comprehensive deployment flexibility. The platform processes 2M+ isolated workloads monthly with enterprise-grade compliance.
Northflank states that it has run production workloads since 2021 and has operated secure sandboxing infrastructure since 2019. The platform maintains SOC 2 Type 2 certification.
Best For: Teams building Pydantic AI workflows that need specific isolation profiles or strict data residency requirements, particularly enterprises with existing multi-cloud infrastructure.
Blaxel is a sandbox platform built specifically for AI agents, with a focus on persistent "agent computers" that stay on standby and resume when needed. The platform supports resume from standby state.
Blaxel maintains SOC 2 Type II, ISO 27001, and HIPAA compliance, meeting enterprise security requirements for sensitive Pydantic AI deployments.
Best For: Teams building Pydantic AI workflows with sporadic execution patterns (such as waiting for human approval) where traditional sandboxes would delete state or continue billing during idle periods.
Fly.io Sprites provides persistent Linux microVMs with large local NVMe-backed storage. Sprites expose a large NVMe-backed cache per Sprite while durable filesystem state is stored through Fly's object-storage-backed storage architecture, giving agents ample space for large codebases and dependencies.
Sprites function as persistent development machines rather than ephemeral sandboxes. Sprite filesystem state persists across hibernation cycles. Memory and process state is preserved across warm suspension but dropped after a cold pause, which occurs after extended idle periods.
Best For: Teams building Pydantic AI workflows maintaining long-running projects (such as multi-day codebase refactoring) where substantial persistent local storage is essential.
Modal combines secure Sandboxes for AI-generated code with on-demand GPU access on a single serverless AI infrastructure platform. While other platforms in this comparison also document GPU-capable sandboxing, Modal offers broad, production-oriented GPU access spanning T4, L4, A10, L40S, A100, A100-40GB, A100-80GB, RTX PRO 6000, H100, H200, and B200 / B200+, with different maturity, access constraints, and GPU catalogs than competitors. For Pydantic AI Harness Code Mode workflows that also need to run ML inference, execute model fine-tuning, or perform compute-intensive analysis, this combination reduces the need to orchestrate multiple platforms.
Modal's code-first SDKs, available in Python, TypeScript, and Go, align naturally with Pydantic AI's type-safe ecosystem. Teams define compute requirements, container images, and scaling behavior directly in code. No YAML configuration or infrastructure management required. This approach enables rapid iteration for Pydantic AI development workflows.
Modal supports 50,000+ concurrent sandbox sessions with fast cold starts and fine-grained observability for metrics, logs, and Sandbox status. For Pydantic AI Harness Code Mode workflows executing generated code at scale, this capacity ensures consistent performance without manual infrastructure provisioning.
With SOC 2 Type II certification, HIPAA support via BAA on Enterprise plans, and gVisor-based sandboxing for compute isolation, Modal meets the compliance requirements that enterprise Pydantic AI deployments demand. The platform uses TLS 1.3 for public APIs and encryption for data in transit and at rest.
Modal powers cloud infrastructure for over 10,000 teams, including production coding agents like Ramp's background agent, which uses Modal Sandboxes to generate code changes and write them back into commits and pull requests. Modal publishes customer examples involving large-scale AI workloads and sandboxed code execution. The platform's AI-native container runtime, custom scheduler, and optimized filesystem are built specifically for the demands of AI workloads.
For teams building Pydantic AI Harness Code Mode workflows that require secure code execution, production-grade reliability, and on-demand GPU access, Modal's combination of sandboxed execution at scale and comprehensive GPU support makes it the clear choice.
Explore the Modal documentation to get started.
Explore the Modal documentation to get started with Pydantic AI sandbox integration.
View Modal DocsA code execution sandbox is an isolated environment where untrusted code can run without accessing the host system, other workloads, or sensitive data. When using Pydantic AI Harness Code Mode, models generate and execute Python code through a controlled execution layer, making sandboxing critical to prevent malicious or buggy generated code from causing damage. Veracode's 2025 GenAI Code Security Report found vulnerabilities in 45% of tested AI-generated code tasks, making isolation essential for production deployments.
Modal uses gVisor-based sandboxing to isolate compute jobs, preventing AI-generated code from affecting other workloads or accessing unauthorized resources. The platform maintains SOC 2 Type II certification, uses TLS 1.3 for public APIs, and encrypts data in transit and at rest. Enterprise customers can access HIPAA-compliant workloads via a Business Associate Agreement.
Serverless sandboxes eliminate infrastructure management overhead. Instead of provisioning instances, configuring networking, and managing container orchestration, teams define everything in code using Modal's SDKs in Python, TypeScript, and Go. Modal's scale-to-zero architecture means Modal Functions scale to zero when there are no inputs, so you pay only for compute you use, with automatic scaling to thousands of containers. Modal Sandboxes have a default maximum lifetime of 5 minutes, can be configured up to 24 hours, and can be automatically terminated after inactivity by setting idle_timeout. This approach enables rapid iteration for Pydantic AI development workflows.
Yes, Modal supports HIPAA-compliant workloads on Enterprise plans through a Business Associate Agreement. Combined with SOC 2 Type II certification, gVisor isolation, and encryption controls, Modal meets the compliance requirements for Pydantic AI applications handling protected health information.
Performance varies significantly by platform architecture, and latency metrics across platforms measure different things and are not directly comparable. Pydantic's embedded Monty interpreter offers cold starts but is experimental, runs a limited Python subset, and is not yet production-ready. Cloud sandbox platforms report varying startup times depending on whether they measure initial sandbox creation or resume from a paused state. Daytona supports sandbox creation for cloud-based agent workloads. Blaxel supports resume from standby state. Choose based on your specific latency requirements, Python compatibility needs, and whether you need full Python support.
GPU access enables Pydantic AI Harness Code Mode workflows to run ML models for code generation, analysis, and understanding at production speeds. Modal combines secure Sandboxes for AI-generated code with on-demand GPU access spanning T4, L4, A10, L40S, A100, A100-40GB, A100-80GB, RTX PRO 6000, H100, H200, and B200 / B200+. This combination reduces the need to orchestrate multiple platforms when workflows need to execute both generated code and ML workloads.