Web endpoints

Modal gives you a few ways to expose functions as web endpoints. You can turn any Modal function into a web endpoint with a single line of code, or you can serve a full app using a framework like FastAPI, Django, or Flask.

All web endpoints on Modal have a limit of 300 requests per second (rps). Get in touch if you need higher limits.

Note that if you wish to invoke a Modal function from another Python application, you can deploy and invoke the function directly with our client library.


The easiest way to create a web endpoint from an existing function is to use the @modal.web_endpoint decorator.

from modal import Stub, web_endpoint

stub = Stub()

def f():
    return "Hello world!"

This decorator wraps the Modal function in a FastAPI application.

Developing with modal serve

You can run this code as an ephemeral app, by running the command

modal serve server_script.py

Where server_script.py is the file name of your code. This will create an ephemeral app for the duration of your script (until you hit Ctrl-C to stop it). It creates a temporary URL that you can use like any other REST endpoint. This URL is on the public internet.

The modal serve command will live-update an app when any of its supporting files change.

Live updating is particularly useful when working with apps containing web endpoints, as any changes made to web endpoint handlers will show up almost immediately, without requiring a manual restart of the app.

Deploying a web server

You can also deploy your app and create a persistent web endpoint in the cloud by running modal deploy:

Passing arguments to web endpoints

When using @web_endpoint, you can use query parameters just like in FastAPI which will be passed to your function as arguments. For instance

from modal import Stub, web_endpoint

stub = Stub()

def square(x: int):
    return {"square": x**2}

If you hit this with an urlencoded query string with the “x” param present, it will send that to the function:

% curl 'https://modal-labs--web-endpoint-get-py-square-erikbern-dev.modal.run?x=42'

If you want to use a POST request, you can use the method argument to @web_endpoint to set the HTTP verb. To accept any valid JSON, you can use Dict as your type annotation and FastAPI will handle the rest.

from typing import Dict

from modal import Stub, web_endpoint

stub = Stub()

def square(item: Dict):
    return {"square": item['x']**2}

This now creates an endpoint that lets us hit it using JSON:

% curl 'https://modal-labs--web-endpoint-post-py-square-erikbern-dev.modal.run' -X POST -H 'Content-Type: application/json' -d '{"x": 42}'

This is often the easiest way to get started, but note that FastAPI recommends that you use typed Pydantic models in order to get automatic validation and documentation. FastAPI also lets you pass data to web endpoints in other ways, for instance as form data and file uploads.

How do web endpoints run in the cloud?

Note that web endpoints, like everything else on Modal, only run when they need to. When you hit the web endpoint the first time, it will boot up the container, which might take a few seconds. Modal keeps the container alive for a short period in case there are subsequent requests. If there are a lot of requests, Modal might create more containers running in parallel.

Under the hood, Modal wraps your function in a FastAPI application, and so functions you write need to follow the same request and response semantics. This also means you can use all of FastAPI’s powerful features, such as Pydantic models for automatic validation, typed query and path parameters, and response types.

For long running web endpoints (taking more than 150s to complete), Modal by default uses chains of HTTP redirects to keep each request reasonably short lived. For more information see Web endpoint timeouts.

More complex example

Here’s everything together, combining Modal’s abilities to run functions in user-defined containers with the expressivity of FastAPI:

from pydantic import BaseModel
from fastapi.responses import HTMLResponse

from modal import Image, Stub, web_endpoint

image = Image.debian_slim().pip_install("boto3")
stub = Stub(image=image)

class Item(BaseModel):
    name: str
    qty: int = 42

def f(item: Item):
    import boto3
    # do things with boto3...
    return HTMLResponse(f"<html>Hello, {item.name}!</html>")

This endpoint definition would be called like so:

curl -d '{"name": "Erik", "qty": 10}' \
    -H "Content-Type: application/json" \
    -X POST https://ecorp--web-demo-f-dev.modal.run

Or in Python with the requests library:

import requests

data = {"name": "Erik", "qty": 10}
requests.post("https://ecorp--web-demo-f-dev.modal.run", json=data, timeout=10.0)

Serving ASGI and WSGI apps

You can also serve any app written in an ASGI or WSGI compatible web application framework on Modal.

ASGI provides support for async web frameworks. WSGI provides support for synchronous web frameworks.


For ASGI apps, you can create a function decorated with @modal.asgi_app that returns a reference to your web app:

from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse

from modal import Image, Stub, asgi_app

web_app = FastAPI()
stub = Stub()

image = Image.debian_slim().pip_install("boto3")

async def foo(request: Request):
    body = await request.json()
    return body

async def bar(arg="world"):
    return HTMLResponse(f"<h1>Hello Fast {arg}!</h1>")

def fastapi_app():
    return web_app

Now, as before, when you deploy this script as a modal app, you get a URL for your app that you can use:


You can serve WSGI apps using the @modal.wsgi_app decorator:

from modal import Image, Stub, wsgi_app

stub = Stub()
image = Image.debian_slim().pip_install("flask")

def flask_app():
    from flask import Flask, request

    web_app = Flask(__name__)

    def home():
        return "Hello Flask World!"

    def echo():
        return request.json

    return web_app

See Flask’s docs for more information on using Flask as a WSGI app.


Functions annotated with @web_endpoint, @asgi_app, or @wsgi_app also support the WebSocket protocol. Consult your web framework for appropriate documentation on how to use WebSockets with that library.

WebSockets on Modal maintain a single function call per connection, which can be useful for keeping state around. Most of the time, you will want to set your handler function to allow concurrent inputs, which allows multiple simultaneous WebSocket connections to be handled by the same container.

We support the full WebSocket protocol as per RFC 6455, but we do not yet have support for RFC 8441 (WebSockets over HTTP/2) or RFC 7692 (permessage-deflate extension). WebSocket messages can be up to 2 MiB each.

Cold start performance

Consult the guide page on cold start performance for more information on when functions incur cold start penalties, and how to mitigate the impact of them.


Modal doesn’t have an first class way to add authentication to web endpoints yet. However, we support standard techniques for securing web servers.

Token-based authentication

This is easy to implement in whichever framework you’re using. For example, if you’re using @modal.web_endpoint or @modal.asgi_app with FastAPI, you can validate a Bearer token like this:

from fastapi import Depends, HTTPException, status, Request
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials

from modal import Secret, Stub, web_endpoint

stub = Stub("auth-example")

auth_scheme = HTTPBearer()

async def f(request: Request, token: HTTPAuthorizationCredentials = Depends(auth_scheme)):
    import os


    if token.credentials != os.environ["AUTH_TOKEN"]:
        raise HTTPException(
            detail="Incorrect bearer token",
            headers={"WWW-Authenticate": "Bearer"},

    # Function body
    return "success!"

This assumes you have a Modal secret named my-web-auth-token created, with contents {AUTH_TOKEN: secret-random-token}. Now, your endpoint will return a 401 status code except when you hit it with the correct Authorization header set (note that you have to prefix the token with Bearer ):

curl --header "Authorization: Bearer secret-random-token" https://modal-labs--auth-example-f.modal.run

Client IP address

You can access the IP address of the client making the request. This can be used for geolocation, whitelists, blacklists, and rate limits.

from modal import Stub, web_endpoint
from fastapi import Request

stub = Stub()

def get_ip_address(request: Request):
    return f"Your IP address is {request.client.host}"