Introducing: WebSockets on Modal

Here at Modal, we’re constantly cranking on complex infrastructure projects. We want to start highlighting some of the heftier features we’ve released recently. On the docket today: Modal now supports WebSocket connections.

WebSocket is a communication protocol for real-time, bidirectional transfer of data between a client and server. Unlike HTTP, in which connections are opened and closed per request/response, WebSocket establishes a persistent connection between client and server. This is advantageous for applications that require low latency and real-time updates.

How to set up a WebSocket server

There’s nothing special you have to do to set up a WebSocket connection on a Modal function; use your library of choice as you would normally. Here’s some boilerplate of what that would look like using FastAPI, for instance:

import modal

app = modal.App("my-app")
app.image = modal.Image.debian_slim().pip_install("fastapi", "websockets")


@app.function()
@modal.asgi_app()
def endpoint():
    from fastapi import FastAPI, WebSocket

    app = FastAPI()

    @app.websocket("/ws")
    async def websocket_handler(websocket: WebSocket) -> None:
        await websocket.accept()
        while True:
            data = await websocket.receive_text()
            await websocket.send_text(f"Message text was: {data}")

    return app

Save the code above to a file called main.py, and deploy it with modal deploy main.py.

Modal treats each WebSocket connection as a single input, so you will want to set your function to allow for concurrent inputs if it is not CPU/GPU-bound. Otherwise, Modal will spin up a new container for each WebSocket connection. Please see our WebSocket documentation for more info.

One of Modal’s primary benefits is automatic scaling based on the volume of inputs your functions are receiving. This applies to WebSocket handlers as well! This makes it super easy for you to build applications that can handle variable request volumes.

For example, let’s say you want to launch a real-time speech-to-text app that will be able to handle many users at once. You can deploy both your WebSocket server and transcription model as Modal Functions. Modal will auto-scale containers for both Functions, without you having to write any of the scaling logic.

Diagram of clients connecting to a Modal WebSocket server

Use cases

At Modal, we’ve heard users ask for WebSocket support to facilitate a few different use cases.

1. Real-time streaming responses

This use case is most common for those building features around audio streaming. A speech-to-text application running on Whisper, for example, may need to stream live, continuous transcription back to end users as audio input comes in. We’ve also had users ask for this in the context of text-to-speech and text-to-image features that require real-time responses to continuous inputs.

2. Status updates on long-running tasks

This use case is especially relevant for workloads that have a long processing time—for example, prompting an LLM to pull insights from a very large body of text. You may want to send progress indicators to the end user for these long-running tasks. WebSockets come into play here because the server can send intermediate updates to the client over the persisted connection.

3. Hosting open-source frameworks out-of-the-box

Several popular frameworks like ComfyUI, Streamlit, and Gradio require WebSocket connections in order to be deployed. Many of our users are utilizing ComfyUI’s GUI to build out stable diffusion pipelines; others are running Streamlit and Gradio to prototype new ML features or build mini-apps. These frameworks rely on WebSockets to power interactive visualizations in the client and surface real-time updates when underlying data changes.

Check out our examples of how to run ComfyUI and Streamlit on Modal.

Got questions? Please reach out and join our community Slack. For those curious, we’ll be publishing a technical deep dive on Modal’s web endpoint system later this week.