Run a FastRTC app on Modal

FastRTC is a Python library for real-time communication on the web. This example demonstrates how to run a simple FastRTC app in the cloud on Modal.

It’s intended to help you get up and running with real-time streaming applications on Modal as quickly as possible. If you’re interested in running a production-grade WebRTC app on Modal, see this example.

In this example, we stream webcam video from a browser to a container on Modal, where the video is flipped, annotated, and sent back with under 100ms of delay. You can try it out here or just dive straight into the code to run it yourself.

Set up FastRTC on Modal

First, we import the modal SDK and use it to define a container image with FastRTC and related dependencies.

import modal

web_image = modal.Image.debian_slim(python_version="3.12").pip_install(
    "fastapi[standard]==0.115.4",
    "fastrtc==0.0.23",
    "gradio==5.7.1",
    "opencv-python-headless==4.11.0.86",
)

Then, we set that as the default Image on our Modal App.

app = modal.App("fastrtc-flip-webcam", image=web_image)

Configure WebRTC streaming on Modal

Under the hood, FastRTC uses the WebRTC APIs and protocols.

WebRTC provides low latency (“real-time”) peer-to-peer communication for Web applications, focusing on audio and video. Considering that the Web is a platform originally designed for high-latency, client-server communication of text and images, that’s no mean feat!

In addition to protocols that implement this communication, WebRTC includes APIs for describing and manipulating audio/video streams. In this demo, we set a few simple parameters, like the direction of the webcam and the minimum frame rate. See the MDN Web Docs for MediaTrackConstraints for more.

TRACK_CONSTRAINTS = {
    "width": {"exact": 640},
    "height": {"exact": 480},
    "frameRate": {"min": 30},
    "facingMode": {  # https://developer.mozilla.org/en-US/docs/Web/API/MediaTrackSettings/facingMode
        "ideal": "user"
    },
}

In theory, the Internet is designed for peer-to-peer communication all the way down to its heart, the Internet Protocol (IP): just send packets between IP addresses. In practice, peer-to-peer communication on the contemporary Internet is fraught with difficulites, from restrictive firewalls to finicky work-arounds for the exhaustion of IPv4 addresses, like Carrier-Grade Network Address Translation (CGNAT).

So establishing peer-to-peer connections can be quite involved. The protocol for doing so is called Interactive Connectivity Establishment (ICE). It is described in this RFC.

ICE involves the peers exchanging a list of connections that might be used. We use a fairly simple setup here, where our peer on Modal uses the Session Traversal Utilities for NAT (STUN) server provided by Google. A STUN server basically just reflects back to a client what their IP address and port number appear to be when they talk to it. The peer on Modal communicates that information to the other peer trying to connect to it — in this case, a browser trying to share a webcam feed. Note the use of stun and port 19302 in the URL in place of something more familiar, like http and port 80.

RTC_CONFIG = {"iceServers": [{"url": "stun:stun.l.google.com:19302"}]}

Running a FastRTC app on Modal

FastRTC builds on top of the Gradio library for defining Web UIs in Python. Gradio in turn is compatible with the Asynchronous Server Gateway Interface (ASGI) protocol for asynchronous Python web servers, like FastAPI, so we can host it on Modal’s cloud platform using the modal.asgi_app decorator with Modal Function.

But before we do that, we need to consider limits: on how many peers can connect to one instance on Modal and on how long they can stay connected. We picked some sensible defaults to show how they interact with the deployment parameters of the Modal Function. You’ll want to tune these for your application!

MAX_CONCURRENT_STREAMS = 10  # number of peers per instance on Modal

MINUTES = 60  # seconds
TIME_LIMIT = 10 * MINUTES  # time limit


@app.function(
    # gradio requires sticky sessions
    # so we limit the number of concurrent containers to 1
    # and allow that container to handle concurrent streams
    max_containers=1,
    scaledown_window=TIME_LIMIT + 1 * MINUTES,  # add a small buffer to time limit
)
@modal.concurrent(max_inputs=MAX_CONCURRENT_STREAMS)  # inputs per container
@modal.asgi_app()  # ASGI on Modal
def ui():
    import fastrtc  # WebRTC in Gradio
    import gradio as gr  # WebUIs in Python
    from fastapi import FastAPI  # asynchronous ASGI server framework
    from gradio.routes import mount_gradio_app  # connects Gradio and FastAPI

    with gr.Blocks() as blocks:  # block-wise UI definition
        gr.HTML(  # simple HTML header
            "<h1 style='text-align: center'>"
            "Streaming Video Processing with Modal and FastRTC"
            "</h1>"
        )

        with gr.Column():  # a column of UI elements
            fastrtc.Stream(  # high-level media streaming UI element
                modality="video",
                mode="send-receive",
                handler=flip_vertically,  # handler -- handle incoming frame, produce outgoing frame
                ui_args={"title": "Click 'Record' to flip your webcam in the cloud"},
                rtc_configuration=RTC_CONFIG,
                track_constraints=TRACK_CONSTRAINTS,
                concurrency_limit=MAX_CONCURRENT_STREAMS,  # limit simultaneous connections
                time_limit=TIME_LIMIT,  # limit time per connection
            )

    return mount_gradio_app(app=FastAPI(), blocks=blocks, path="/")

To try this out for yourself, run

modal serve 07_web_endpoints/fastrtc_flip_webcam.py

and head to the modal.run URL that appears in your terminal. You can also check on the application’s dashboard via the modal.com URL thatappears below it.

The modal serve command produces a hot-reloading development server — try editing the title in the ui_args above and watch the server redeploy.

This temporary deployment is tied to your terminal session. To deploy permanently, run

modal deploy 07_web_endponts/fastrtc_flip_webcam.py

Note that Modal is a serverless platform with usage-based pricing, so this application will spin down and cost you nothing when it is not in use.

Addenda

This FastRTC app is very much the “hello world” or “echo server” of FastRTC: it just flips the incoming webcam stream and adds a “hello” message. That logic appears below.

def flip_vertically(image):
    import cv2
    import numpy as np

    image = image.astype(np.uint8)

    if image is None:
        print("failed to decode image")
        return

    # flip vertically and caption to show video was processed on Modal
    image = cv2.flip(image, 0)
    lines = ["Hello from Modal!"]
    caption_image(image, lines)

    return image


def caption_image(
    img, lines, font_scale=0.8, thickness=2, margin=10, font=None, color=None
):
    import cv2

    if font is None:
        font = cv2.FONT_HERSHEY_SIMPLEX
    if color is None:
        color = (127, 238, 100, 128)  # Modal Green

    # get text sizes
    sizes = [cv2.getTextSize(line, font, font_scale, thickness)[0] for line in lines]
    if not sizes:
        return

    # position text in bottom right
    pos_xs = [img.shape[1] - size[0] - margin for size in sizes]

    pos_ys = [img.shape[0] - margin]
    for _width, height in reversed(sizes[:-1]):
        next_pos = pos_ys[-1] - 2 * height
        pos_ys.append(next_pos)

    for line, pos in zip(lines, zip(pos_xs, reversed(pos_ys))):
        cv2.putText(img, line, pos, font, font_scale, color, thickness)