Web endpoints
Modal gives you a few ways to expose functions as web endpoints. You can either turn any Modal function into a webhook with a single line of code, or you can serve an entire ASGI-compatible app (including existing apps written with frameworks such as FastAPI or Flask).
@webhook
The easiest way to create a webhook out of an existing function is to use the
@modal.webhook
decorator.
import modal
stub = modal.Stub()
@stub.webhook
def f():
return "Hello world!"
Developing with modal serve
You can run this code as an ephemeral app, by running the command
modal serve server_script.py
Where server_script.py
is the file name of your code. This will create an
ephemeral app for the duration of your script (until you hit Ctrl-C to stop it).
It creates a temporary URL that you can use like any other REST endpoint. This
URL is on the public internet.
The serve
command (which you can also call programmatically using
stub.serve()
, will live update an app when any of the supporting files change.
Live updating is particularly useful when working with apps containing webhooks, as any changes made to webhook handlers will show up almost immediately, without requiring a manual restart of the app.
Deploying a web server
You can also deploy your app and create a persistent webhook in the cloud by
running modal deploy
:
Passing arguments to webhooks
When using @stub.webhook
, you can use
query parameters just like in FastAPI
which will be passed to your function as arguments. For instance
import modal
stub = modal.Stub()
@stub.webhook
def square(x: int):
return {"square": x**2}
If you hit this with an urlencoded query string with the “x” param present, it will send that to the function:
% curl 'https://modal-labs--webhook-get-py-square-erikbern-dev.modal.run?x=42'
{"square":1764}
If you want to use a POST
request, you can define the
request body just like in FastAPI:
from pydantic import BaseModel
import modal
stub = modal.Stub()
class Item(BaseModel):
x: int = 123
@stub.webhook(method="POST")
def square(item: Item):
return {"square": item.x**2}
This now creates an endpoint that lets us hit it using JSON:
% curl 'https://modal-labs--webhook-post-py-square-erikbern-dev.modal.run' -X POST -H 'Content-Type: application/json' -d '{"x": 42}'
{"square":1764}
FastAPI lets you pass data to webhook in other ways too, for instance as form data, JSON without using a pydantic Model, and file uploads.
More configuration
In addition to the keyword arguments supported by a regular stub.function
,
webhooks take an optional method
argument to set the HTTP method of the REST
endpoint (see reference for a full list of
supported arguments).
How do webhooks run in the cloud?
Note that webhooks, like everything else on Modal, only run when they need to. When you hit the webhook the first time, it will boot up the container, which might take a few seconds. Modal keeps the container alive for a short period (a minute, at most) in case there are subsequent requests. If there are a lot of requests, Modal might create more containers running in parallel.
Under the hood, Modal wraps your function in a FastAPI application, and so functions you write need to follow the same request and response semantics as FastAPI. This also means you can use all of FastAPI’s powerful features, such as Pydantic models for automatic validation, typed query and path parameters, and response types.
For long running webhooks (taking more than 150s to complete), Modal by default uses chains of HTTP redirects to keep each request reasonably short lived. For more information see Webhook timeouts.
More complex example
Here’s everything together, combining Modal’s abilities to run functions in user-defined containers with the expressivity of FastAPI:
from pydantic import BaseModel
from fastapi.responses import HTMLResponse
class Item(BaseModel):
name: str
qty: int = 42
image = modal.Image.debian_slim().pip_install("boto3")
@stub.webhook(method="POST", image=image)
def f(item: Item):
import boto3
# do things with boto3
return HTMLResponse(f"<html>Hello, {item.name}!</html>")
Serving ASGI and WSGI apps
You can also serve any app written in an
ASGI or
WSGI compatible
web application framework on Modal. For ASGI apps, you can create a function
decorated with @modal.asgi
that returns a
reference to your web app:
from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse
import modal
web_app = FastAPI()
stub = modal.Stub()
image = modal.Image.debian_slim().pip_install("boto3")
@web_app.post("/foo")
async def foo(request: Request):
body = await request.json()
return body
@web_app.get("/bar")
async def bar(arg="world"):
return HTMLResponse(f"<h1>Hello Fast {arg}!</h1>")
@stub.asgi(image=image)
def fastapi_app():
return web_app
Now, as before, when you deploy this script as a modal app, you get a URL for your app that you can use:
WSGI
You can serve WSGI apps using the wsgi
decorator:
import modal
stub = modal.Stub()
image = modal.Image.debian_slim().pip_install("flask")
@stub.wsgi(image=image)
def flask_app():
from flask import Flask, request
web_app = Flask(__name__)
@web_app.get("/")
def home():
return "Hello Flask World!"
@web_app.post("/foo")
def foo():
return request.json
return web_app
See Flask’s docs for more information.
Latency and keep_warm
Modal will spin up as many containers as needed to handle the current number of concurrent requests. Starting up containers incurs a cold-start time of 1-2s (depending on the size of the image). After the cold-start, subsequent requests to the same container will see lower response latency (~50-200ms). The container is shut down after a period of inactivity, currently fixed at 1 minute.
If you’re building a latency-sensitive application and wish to avoid cold-start
times, you could set the keep_warm
flag on the
@webhook
,
@modal.asgi
, or
@modal.wsgi
decorators. This provisions a
warm pool of containers that don’t expire due to inactivity, and thus avoid
incurring the cold-start penalty. At present, the warm pool has a fixed size of
2 (but we’ll spawn more containers when you need them, as usual).
import modal
stub = modal.Stub()
@stub.webhook(keep_warm=True)
def f():
return {"hello": "world"}
Functions with slow start-up and keep_warm
The guarantee that keep_warm
provides is that there are always at least 2
containers up that have finished starting up. If your function does expensive /
slow initialization the first time it receives an input (e.g. if you use a
pre-trained model, and this model needs to be loaded into memory the first time
you use it), you’d observe that those function calls will still be slow.
To avoid this, you can use a container enter method to perform the expensive initialization. This will ensure that the initialization is performed before the container is deemed ready for the warm pool.