Serve custom AI models at scale

Add one line of code to run any function in the cloud. Get instant autoscaling for ML inference, data jobs, and more.

Ramp
Quora
Substack
Cartesia
Cursor
Suno
Mistral AI
Contextual AI
Sub-second container starts

We built a Rust-based container stack from scratch so you can iterate as quickly in the cloud as you can locally.

Features

Flexible Environments icon

Flexible Environments

Bring your own image or build one in Python, scale resources as needed, and leverage state-of-the-art GPUs like H100s & A100s for high-performance computing.

Seamless Integrations icon

Seamless Integrations

Export function logs to Datadog or any OpenTelemetry-compatible provider, and easily mount cloud storage from major providers (S3, R2 etc.).

Data Storage icon

Data Storage

Manage data effortlessly with storage solutions (network volumes, key-value stores and queues). Provision storage types and interact with them using familiar Python syntax.

Job Scheduling icon

Job Scheduling

Take control of your workloads with powerful scheduling. Set up cron jobs, retries, and timeouts, or use batching to optimize resource usage.

Web Endpoints icon

Web Endpoints

Deploy and manage web services with ease. Create custom domains, set up streaming and websockets, and serve functions as secure HTTPS endpoints.

Built-In Debugging icon

Built-In Debugging

Troubleshoot efficiently with built-in debugging tools. Use the modal shell for interactive debugging and set breakpoints to pinpoint issues quickly.

Use Cases

Generative AI Inference that scales with you

Fast cold boots icon

Fast cold boots

Load gigabytes of weights in seconds with our optimized container file system.

Bring your own code icon

Bring your own code

Deploy anything from custom models to popular frameworks.

Seamless autoscaling icon

Seamless autoscaling

Handle bursty and unpredictable load by scaling to thousands of GPUs and back down to zero.

AI inference interface showing image generation

Fine-tuning and training without managing infrastructure

Start training immediately icon

Start training immediately

Provision Nvidia A100 and H100 GPUs in seconds. Your drivers and custom packages are already there.

Never wait in line icon

Never wait in line

Run as many experiments as you need to, in parallel. Stop paying for idle GPUs when you're done.

Cloud storage icon

Cloud storage

Mount weights and data in distributed volumes, then access them wherever they're needed.

Fine-tuning visualization with code terminal and GPU blocks

Batch processing optimized for high-volume workloads

Supercomputing scale icon

Supercomputing scale

Serverless, but for high-performance compute. Run things on massive amounts of CPU and memory.

Serverless pricing icon

Serverless pricing

Pay only for resources consumed, by the second, as you spin up containers.

Powerful compute primitives icon

Powerful compute primitives

Simple fan-out parallelism that scales to thousands of containers, with a single line of Python.

Batch processing visualization with geometric shapes

Security

and governance

Built on top of gVisor

The secure application kernel for containers, providing top-tier isolation in multi-tenant setups.

SOC 2 and HIPAA

Fully compliant with SOC 2. Run HIPAA-compliant workloads. We have industry-standard security, availability, and confidentiality.

Region support

Deploy globally with enhanced compliance across geographic regions.

SSO sign in for enterprise

Enterprise-grade SSO for transparent, streamlined access management.

"Modal Sandboxes enable us to execute generated code securely and flexibly. We expedited the development of our code interpreter feature integrated into Le Chat."

Wendy Shang, AI Scientist

Ship your first app in

minutes.

$30 / month free compute

Modal Logo