Article

February 24, 2025•5 minute read

How to deploy LiveKit Agents on Modal

Yiren Lu

Solutions Engineer

If you are looking to build a real-time voice or video application, you can’t just use HTTP. It’s too slow. Traditional HTTP is request-response based, creating overhead for each interaction. Establishing new TCP connections and handshaking also creates additional latency.

Instead, you should be using technologies like WebRTC. WebRTC is purpose-built for peer-to-peer audio/video streaming and data sharing without requiring plugins or additional software.

But WebRTC is complex. It’s not easy to get right. You often have to write thousands of lines of boilerplate code to handle connections, signalling, media capture, peer connections, ICE candidates, STUN/TURN servers etc.

That’s why LiveKit has become so popular. LiveKit is an open-source library that abstracts away the complexity of working with WebRTC. Rather than having to deal with all the boilerplate yourself, you just use LiveKit’s SDK.

LiveKit Agents

Recently, LiveKit has launched a framework for building real-time voice assistants, called LiveKit Agents.

It allows you to define an AI agent that will join as a participant in a LiveKit room.

This guide will walk you through deploying LiveKit agents on Modal using Python. We’ll cover the LiveKit agent setup, the different configuration options within LiveKit, and how to actually deploy on Modal.

LiveKit Agent Lifecycle

Here’s a high-level overview of the agent lifecycle:

Worker registration: Your agent connects to the LiveKit server, registering as a “worker” via a WebSocket.
Agent dispatch: When a user connects to a room, the LiveKit server selects an available worker, which then instantiates your program and joins the room. A worker can run multiple agent instances in separate processes.
Your program: Here, you utilize the LiveKit Python SDK and can leverage plugins for processing voice and video data.
Room close: The room closes automatically when the last non-agent participant leaves, and then disconnects remaining agents.

You can also deploy LiveKit Agents on Render, Kubernetes, and other cloud providers, but we think that Modal is the best option. Modal is a serverless cloud platform and Python library. With Modal, you can write a Python function, add a Modal decorator, and deploy your application in a container in the cloud in seconds.

✅ No Infrastructure Management

Modal removes the complexity of managing Kubernetes clusters or provisioning cloud instances. Your LiveKit agents run in a fully managed environment with zero operational overhead.

✅ Automatic Scaling

With Modal, you can scale your LiveKit workloads dynamically based on demand. Modal’s serverless execution model ensures you only pay for what you use.

✅ Optimized GPU Execution

If your agent needs to run deep learning models, Modal supports running your workloads on GPUs like NVIDIA H100s.

Prerequisites

To run the following code, you will need:

A LiveKit account
A Modal account
Accounts with the different AI API providers you want to use (OpenAI , Cartesia, Deepgram, etc.), along with their API keys.
Run pip install modal to install the modal Python package
Run modal setup to authenticate (if this doesn’t work, try python -m modal setup)
Copy the code below into a file called app.py
Run modal run app.py

Before deploying your LiveKit agent, you need to add your API keys and secrets to the Modal Dashboard to securely store and access them.

Navigate to the Secrets section in the Modal dashboard and add the following secrets (for example purposes, we’re using OpenAI, Cartesia, and Deepgram):

Modal Secrets

LIVEKIT_URL - Your LiveKit WebRTC server URL
LIVEKIT_API_KEY - API key for authenticating LiveKit requests
LIVEKIT_API_SECRET - API secret for LiveKit authentication

You can find your LiveKit URL and API keys under Settings > Project and Settings > Keys in the LiveKit dashboard.

OPENAI_API_KEY - API key for OpenAI’s GPT-based processing
CARTESIA_API_KEY - API key for Cartesia’s TTS services
DEEPGRAM_API_KEY - API key for Deepgram’s STT services

Once added, you can reference these secrets in your Modal functions.

We define a Modal App with a lightweight Debian-based container image, then install the necessary Python packages.

We also pre-import libraries that will be used by the functions we run on Modal in a given image using the with image.imports context manager.

from modal import App, Image, Secret, fastapi_endpoint, FunctionCall, Dict
import asyncio

image = Image.debian_slim().pip_install(
    "livekit>=0.19.1",
    "livekit-agents>=0.12.11",
    "livekit-plugins-openai>=0.10.17",
    "livekit-plugins-silero>=0.7.4",
    "livekit-plugins-cartesia==0.4.7",
    "livekit-plugins-deepgram==0.6.19",
    "python-dotenv~=1.0",
    "cartesia==2.0.0a0",
    "fastapi[standard]",
    "aiohttp",
)

app = App("livekit-example", image=image)

# Create a persisted dict - the data gets retained between app runs
room_dict = Dict.from_name("room-dict", create_if_missing=True)

with image.imports():
    from livekit import rtc
    from livekit.agents import AutoSubscribe, JobContext
    from livekit.agents.worker import Worker, WorkerOptions

    from livekit.agents import llm
    from livekit.agents.pipeline import VoicePipelineAgent
    from livekit.plugins import openai, deepgram, silero, cartesia

Step 3: LiveKit Agent Entrypoint

Define the entrypoint function that connects the agent to a LiveKit room:

async def livekit_entrypoint(ctx: JobContext):
    print("Connecting to room", ctx.room.name)
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    participant = await ctx.wait_for_participant()
    run_multimodal_agent(ctx, participant)

This function:

Connects to a LiveKit room
Subscribes to audio-only streams
Waits for a participant to join
Starts a multimodal AI-powered agent

Step 4: Running the Multimodal AI Agent

Finally, we define a multimodal agent that uses Deepgram’s API for speech recognition (STT), OpenAI’s GPT-4o-mini for the large language model (LLM), and Cartesia’s TTS for text-to-speech (TTS) to process voice interactions. You can also use other LLMs and TTS services - LiveKit supports a wide range of plugins.

def run_multimodal_agent(ctx: JobContext, participant: rtc.RemoteParticipant):
    print("Starting multimodal agent")

    initial_ctx = llm.ChatContext().append(
        role="system",
        text="You are a voice assistant created by Modal. You answer questions and help with tasks."
    )

    agent = VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-2-general"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(),
        chat_ctx=initial_ctx,
    )
    agent.start(ctx.room, participant)

Step 5: Handle LiveKit Web Events on room creation and deletion

LiveKit can be configured to send webhooks upon different events, like when a room is started or finished.

To handle these events, we use Modal’s @fastapi_endpoint decorator to create a FastAPI endpoint that listens for these events. Upon room creation, we spawn a container in Modal to run the LiveKit worker. Upon room completion, the function is cancelled and the Modal function is spun down. What this means is that you are only charged for when a room is actually open and running.

@app.function(image=image)
@fastapi_endpoint(method="POST")
async def run_livekit_agent(request: dict):
    from aiohttp import web

    room_name = request["room"]["sid"]

    ## check whether the room is already in the room_dict
    if room_name in room_dict and request["event"] == "room_started":
        print(
            f"Received web event for room {room_name} that already has a worker running"
        )
        return web.Response(status=200)

    if request["event"] == "room_started":
        call = run_agent_worker.spawn(room_name)
        room_dict[room_name] = call.object_id
        print(f"Worker for room {room_name} spawned")

    elif request["event"] == "room_finished":
        if room_name in room_dict:
            function_call = FunctionCall.from_id(room_dict[room_name])
            # spin down the Modal function
            function_call.cancel()
            # delete the room from the room_dict
            del room_dict[room_name]
            print(f"Worker for room {room_name} spun down")

    return web.Response(status=200)

Step 6: Running the LiveKit Worker

Next, we define a Modal function that runs the LiveKit worker. We specify that we want to run this function (i.e. the LiveKit worker) with a GPU. We also want to handle the case where the worker is cancelled, whereupon it will receive a cancellation signal and clean up.

@app.function(
    gpu="A100", timeout=3000, secrets=[Secret.from_name("livekit-voice-agent")]
)
async def run_agent_worker(room_name: str):
    import os
    print("Running worker")

    worker = Worker(
        WorkerOptions(
            entrypoint_fnc=livekit_entrypoint,
            ws_url=os.environ.get("LIVEKIT_URL"),
        )
    )

    try:
        await worker.run()  # Wait for the worker to finish
    except asyncio.CancelledError:
        print(f"Worker for room {room_name} was cancelled. Cleaning up...")
        # Perform cleanup before termination

        await worker.drain()
        await worker.aclose()
        print(f"Worker for room {room_name} shutdown complete.")
        raise  # Re-raise to propagate the cancellation
    finally:
        await worker.drain()
        await worker.aclose()

With all this code in an app.py file, we can deploy both the Modal function and the FastAPI endpoint by running modal deploy app.py.

In stdout, you’ll see the URL of the FastAPI endpoint, which you need to copy and add to the LiveKit dashboard as the webhook URL.

settings webhooks

Step 8: Spinning up a LiveKit frontend

LiveKit provides a frontend Sandbox that you can use to test your agent.

LiveKit Sandbox

Go to the LiveKit dashboard > Sandbox > Voice assistant. You should be able to instantiate a voice assistant frontend sandbox. Since you have deployed your agent with the appropriate LIVEKIT_URL, the frontend sandbox will automatically connect to your agent.

Conclusion

LiveKit Agents allows developers to build real-time voice assistants with minimal effort.

And the best way to deploy is with Modal!

How to deploy LiveKit Agents on Modal

LiveKit Agents

LiveKit Agent Lifecycle

Why Deploy LiveKit Agents on Modal?

✅ No Infrastructure Management

✅ Automatic Scaling

✅ Optimized GPU Execution

Prerequisites

Setting Up LiveKit Agents on Modal

Step 1: Adding Secrets in Modal Dashboard

Step 2: Define the Modal Application