Serve custom AI
models at scale

Add one line of code to run any function in the cloud. Get instant autoscaling for ML inference, data jobs, and more.

Get Started

Try the playground

Sub-second container starts

We built a Rust-based container stack from scratch so you can iterate as quickly in the cloud as you can locally.

View Docs

Zero config files

Easily define hardware and container requirements next to your Python functions.

View Docs

Scale to hundreds of GPUs in seconds

Never worry about hitting rate limits again. We autoscale containers for your functions instantly.

View Docs

Use Cases

Generative AI Inference that scales with you

View Examples

Fast cold boots

Load gigabytes of weights in seconds with our optimized container file system.

Bring your own code

Deploy anything from custom models to popular frameworks.

Seamless autoscaling

Handle bursty and unpredictable load by scaling to thousands of GPUs and back down to zero.

View Examples

Fine-tuning and training without managing infrastructure

View Examples

Start training immediately

Provision Nvidia A100 and H100 GPUs in seconds. Your drivers and custom packages are already there.

Never wait in line

Run as many experiments as you need to, in parallel. Stop paying for idle GPUs when you’re done.

Cloud storage

Mount weights and data in distributed volumes, then access them wherever they’re needed.

View Examples

Batch processing optimized for high-volume workloads

View Examples

Supercomputing scale

Serverless, but for high-performance compute. Run things on massive amounts of CPU and memory.

Serverless pricing

Pay only for resources consumed, by the second, as you spin up containers.

Powerful compute primitives

Simple fan-out parallelism that scales to thousands of containers, with a single line of Python.

View Examples

Build anything with Modal

Language Models

Image, Video, 3D

Audio Processing

Fine-Tuning

Batch Processing

Sandboxed Code

Computational Bio

Language Models

Image, Video, 3D

Audio Processing

Fine-Tuning

Batch Processing

Sandboxed Code

Computational Bio

Features

Flexible Environments

Bring your own image or build one in Python, scale resources as needed, and leverage state-of-the-art GPUs like H100s & A100s for high-performance computing.

Seamless Integrations

Export function logs to Datadog or any OpenTelemetry-compatible provider, and easily mount cloud storage from major providers (S3, R2 etc.).

Data Storage

Manage data effortlessly with storage solutions (network volumes, key-value stores and queues). Provision storage types and interact with them using familiar Python syntax.

Job Scheduling

Take control of your workloads with powerful scheduling. Set up cron jobs, retries, and timeouts, or use batching to optimize resource usage.

Web Endpoints

Deploy and manage web services with ease. Create custom domains, set up streaming and websockets, and serve functions as secure HTTPS endpoints.

Built-In Debugging

Troubleshoot efficiently with built-in debugging tools. Use the modal shell for interactive debugging and set breakpoints to pinpoint issues quickly.

Only pay when your
code is running

Scale up to hundreds of nodes and down to zero within seconds. Pay for actual compute, by the CPU cycle. With $30 of compute on us, every month.

Compute costs

GPU Tasks

Nvidia B200

$0.001736 / sec

Nvidia H200

$0.001261 / sec

Nvidia H100

$0.001097 / sec

Nvidia A100, 80 GB

$0.000694 / sec

Nvidia A100, 40 GB

$0.000583 / sec

Nvidia L40S

$0.000542 / sec

Nvidia A10G

$0.000306 / sec

Nvidia L4

$0.000222 / sec

Nvidia T4

$0.000164 / sec

CPU

Physical core
(2 vCPU equivalent)

$0.0000131 / core / sec

*minimum of 0.125 cores per container

Memory

$0.00000222 / GiB / sec

For teams
of all scales

Starter

For small teams and independent developers looking to level up.

Team

For startups and larger organizations looking to scale quickly.

Enterprise

For organizations prioritizing security, support, and reliability.

View Pricing

Security and governance

Learn More

Built with Modal

View all

Deploy an OpenAI-compatible LLM service

Run large language models with a drop-in replacement for the OpenAI API.

Custom pet art from Flux with Hugging Face and Gradio

Fine-tune an image generation model on pictures of your pet.

Run llama.cpp

Run DeepSeek-R1 and Phi-4 on llama.cpp

Voice chat with LLMs

Build an interactive voice chat app.

Serve diffusion models

Serve Flux on Modal with a number of optimizations for blazingly fast inference.

Fold proteins with Chai-1

Predict molecular structures from sequences with SotA open source models.

Serverless TensorRT-LLM (LLaMA 3 8B)

Run interactive language model applications.

Star in custom music videos

Fine-tune a Wan2.1 video model on your face and run it in parallel

Create music

Turn prompts into music with MusicGen

Sandbox a LangGraph agent's code

Run an LLM coding agent that runs its own language models.

RAG Chat with PDFs

Use ColBERT-style, multimodal embeddings with a Vision-Language Model to answer questions about documents.

Bring images to life

Prompt a generative video model to animate an image.

Fast podcast transcriptions

Build an end-to-end podcast transcription app that leverages dozens of containers for super-fast processing.

Build a protein folding dashboard

Serve a web UI for a protein model with ESM3, Molstar, and Gradio

Deploy a Hacker News Slackbot

Periodically post new Hacker News posts to Slack.

Retrieval-Augmented Generation (RAG) for Q&A

Build a question-answering web endpoint that can cite its sources.

Document OCR job queue

Use Modal as an infinitely scalable job queue that can service async tasks from a web app.

Parallel processing of Parquet files on S3

Analyze data from the Taxi and Limousine Commission of NYC in parallel.

“Modal Sandboxes enable us to execute generated code securely and flexibly. We expedited the development of our code interpreter feature integrated into Le Chat.”

Wendy Shang, AI Scientist

“Modal makes it easy to write code that runs on 100s of GPUs in parallel, transcribing podcasts in a fraction of the time.”

Mike Cohen, Head of Data

“Tasks that would have taken days to complete take minutes instead. We’ve saved thousands of dollars deploying LLMs on Modal.”

Rahul Sengottuvelu, Head of Applied AI

“The beauty of Modal is that all you need to know is that you can scale your function calls in the cloud with a few lines of Python.”

Georg Kucsko, Co-founder and CTO

Join Modal's developer
community

Modal Community Slack

Igor Kotua

Engineer, The Linux Foundation

If you building AI stuff with Python and haven't tried @modal_labs you are missing out big time

Daniel Rothenberg

Co-founder, Brightband

@modal_labs continues to be magical... 10 minutes of effort and the `joblib`-based parallelism I use to test on my local machine can trivially scale out on the cloud. Makes life so easy!

Erin Boyle

ML Engineer, Tesla

This tool is awesome. So empowering to have your infra needs met with just a couple decorators. Good people, too!

Jai Chopra

Product, LanceDB

Recently built an app on Lambda and just started to use @modal_labs, the difference is insane! Modal is amazing, virtually no cold start time, onboarding experience is great 🚀

Diego Fernandes

Co-founder & CTO, RocketSeat

Probably one of the best piece of software I'm using this year: modal.com

Adam Azzam

Product, Prefect

feels weird at this point to use anything else than @modal_labs for this — absolutely the GOAT of dynamic sandboxes

Rémi 📎

Co-founder & CEO, .txt

Nothing beats @modal_labs when it comes to deploying a quick POC

Matt Holden

Founder

Late to the party, but finally playing with @modal_labs to run some backend jobs. DX is sooo nice (compared to Docker, Cloud Run, Lambda, etc). Just decorate a Python function and deploy. And it's fast! Love it.

Igor Kotua

Engineer, The Linux Foundation

If you building AI stuff with Python and haven't tried @modal_labs you are missing out big time

Daniel Rothenberg

Co-founder, Brightband

@modal_labs continues to be magical... 10 minutes of effort and the `joblib`-based parallelism I use to test on my local machine can trivially scale out on the cloud. Makes life so easy!

Erin Boyle

ML Engineer, Tesla

This tool is awesome. So empowering to have your infra needs met with just a couple decorators. Good people, too!

Jai Chopra

Product, LanceDB

Recently built an app on Lambda and just started to use @modal_labs, the difference is insane! Modal is amazing, virtually no cold start time, onboarding experience is great 🚀

Diego Fernandes

Co-founder & CTO, RocketSeat

Probably one of the best piece of software I'm using this year: modal.com

Adam Azzam

Product, Prefect

feels weird at this point to use anything else than @modal_labs for this — absolutely the GOAT of dynamic sandboxes

Rémi 📎

Co-founder & CEO, .txt

Nothing beats @modal_labs when it comes to deploying a quick POC

Matt Holden

Founder

Caleb

ML Engineer, Hugging Face

Bullish on @modal_labs - Great Docs + Examples - Healthy Free Plan (30$ free compute / month) - Never have to worry about infra / just Python

@mattzcarey.com on blsky

AI Engineer, StackOne

@modal_labs has got a bunch of stuff just worked out this should be how you deploy python apps. wow

Aman Kishore

Research Engineer, Harvey

If you are still using AWS Lambda instead of @modal_labs you're not moving fast enough

Izzy Miller

DevRel, Hex

special shout out to @modal_labs and @_hex_tech for providing the crucial infrastructure to run this! Modal is the coolest tool I’ve tried in a really long time— cannnot say enough good things.

Mark Tenenholtz

Head of AI, PredeloHQ

I use @modal_labs because it brings me joy. There isn't much more to it.

Nick Schrock

Founder, Dagster Labs

I have tried @modal_labs and am now officially Modal-pilled. Great work @bernhardsson and team. Every hyperscalar should be trying this out and immediately pivoting their compute teams' roadmaps to match this DX.

Moin Nadeem

Co-founder, Phonic

I've realized @modal_labs is actually a great fit for ML training pipelines. If you're running model-based evals, why not just call a serverless Modal function and have it evaluate your model on a separate worker GPU? This makes evaluation during training really easy.

Caleb

ML Engineer, Hugging Face

Bullish on @modal_labs - Great Docs + Examples - Healthy Free Plan (30$ free compute / month) - Never have to worry about infra / just Python

@mattzcarey.com on blsky

AI Engineer, StackOne

@modal_labs has got a bunch of stuff just worked out this should be how you deploy python apps. wow

Aman Kishore

Research Engineer, Harvey

If you are still using AWS Lambda instead of @modal_labs you're not moving fast enough

Izzy Miller

DevRel, Hex

special shout out to @modal_labs and @_hex_tech for providing the crucial infrastructure to run this! Modal is the coolest tool I’ve tried in a really long time— cannnot say enough good things.

Mark Tenenholtz

Head of AI, PredeloHQ

I use @modal_labs because it brings me joy. There isn't much more to it.

Nick Schrock

Founder, Dagster Labs

Moin Nadeem

Co-founder, Phonic