Startups get up to $50k in free compute credits.
February 24, 20255 minute read
How to deploy LiveKit Agents on Modal
author
Yiren Lu
Solutions Engineer

If you are looking to build a real-time voice or video application, you can’t just use HTTP. It’s too slow. Traditional HTTP is request-response based, creating overhead for each interaction. Establishing new TCP connections and handshaking also creates additional latency.

Instead, you should be using technologies like WebRTC. WebRTC is purpose-built for peer-to-peer audio/video streaming and data sharing without requiring plugins or additional software.

But WebRTC is complex. It’s not easy to get right. You often have to write thousands of lines of boilerplate code to handle connections, signalling, media capture, peer connections, ICE candidates, STUN/TURN servers etc.

That’s why LiveKit has become so popular. LiveKit is an open-source library that abstracts away the complexity of working with WebRTC. Rather than having to deal with all the boilerplate yourself, you just use LiveKit’s SDK.

LiveKit Agents

Recently, LiveKit has launched a framework for building real-time voice assistants, called LiveKit Agents.

It allows you to define an AI agent that will join as a participant in a LiveKit room.

This guide will walk you through deploying LiveKit agents on Modal using Python. We’ll cover the LiveKit agent setup, the different configuration options within LiveKit, and how to actually deploy on Modal.

LiveKit Agent Lifecycle

Here’s a high-level overview of the agent lifecycle:

  • Worker registration: Your agent connects to the LiveKit server, registering as a “worker” via a WebSocket.

  • Agent dispatch: When a user connects to a room, the LiveKit server selects an available worker, which then instantiates your program and joins the room. A worker can run multiple agent instances in separate processes.

  • Your program: Here, you utilize the LiveKit Python SDK and can leverage plugins for processing voice and video data.

  • Room close: The room closes automatically when the last non-agent participant leaves, and then disconnects remaining agents.

Why Deploy LiveKit Agents on Modal?

You can also deploy LiveKit Agents on Render, Kubernetes, and other cloud providers, but we think that Modal is the best option. Modal is a serverless cloud platform and Python library. With Modal, you can write a Python function, add a Modal decorator, and deploy your application in a container in the cloud in seconds.

No Infrastructure Management

Modal removes the complexity of managing Kubernetes clusters or provisioning cloud instances. Your LiveKit agents run in a fully managed environment with zero operational overhead.

Automatic Scaling

With Modal, you can scale your LiveKit workloads dynamically based on demand. Modal’s serverless execution model ensures you only pay for what you use.

Optimized GPU Execution

If your agent needs to run deep learning models, Modal supports running your workloads on GPUs like NVIDIA H100s.

Prerequisites

To run the following code, you will need:

  1. A LiveKit account
  2. A Modal account
  3. Accounts with the different AI API providers you want to use (OpenAI , Cartesia, Deepgram, etc.), along with their API keys.
  4. Run pip install modal to install the modal Python package
  5. Run modal setup to authenticate (if this doesn’t work, try python -m modal setup)
  6. Copy the code below into a file called app.py
  7. Run modal run app.py

Setting Up LiveKit Agents on Modal

Step 1: Adding Secrets in Modal Dashboard

Before deploying your LiveKit agent, you need to add your API keys and secrets to the Modal Dashboard to securely store and access them.

Navigate to the Secrets section in the Modal dashboard and add the following secrets (for example purposes, we’re using OpenAI, Cartesia, and Deepgram):

Modal Secrets

  • LIVEKIT_URL - Your LiveKit WebRTC server URL
  • LIVEKIT_API_KEY - API key for authenticating LiveKit requests
  • LIVEKIT_API_SECRET - API secret for LiveKit authentication

You can find your LiveKit URL and API keys under Settings > Project and Settings > Keys in the LiveKit dashboard.

  • OPENAI_API_KEY - API key for OpenAI’s GPT-based processing
  • CARTESIA_API_KEY - API key for Cartesia’s TTS services
  • DEEPGRAM_API_KEY - API key for Deepgram’s STT services

Once added, you can reference these secrets in your Modal functions.

Step 2: Define the Modal Application

We define a Modal App with a lightweight Debian-based container image, then install the necessary Python packages.

We also pre-import libraries that will be used by the functions we run on Modal in a given image using the with image.imports context manager.

from modal import App, Image, Secret

image = Image.debian_slim()
.pip_install("livekit>=0.19.1",
"livekit-agents>=0.12.11",
"livekit-plugins-openai>=0.10.17",
"livekit-plugins-silero>=0.7.4",
"livekit-plugins-cartesia==0.4.7",
"livekit-plugins-deepgram==0.6.19",
"python-dotenv~=1.0",
"cartesia==2.0.0a0")

app = App("livekit-example", image=image)

with image.imports():
    from livekit import rtc
    from livekit.agents import AutoSubscribe, JobContext
    from livekit.agents.worker import Worker, WorkerOptions

    from livekit.agents import llm
    from livekit.agents.pipeline import VoicePipelineAgent
    from livekit.plugins import openai, deepgram, silero, cartesia

Step 3: LiveKit Agent Entrypoint

Define the entrypoint function that connects the agent to a LiveKit room:

async def livekit_entrypoint(ctx: JobContext):
    print("Connecting to room", ctx.room.name)
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    participant = await ctx.wait_for_participant()
    run_multimodal_agent(ctx, participant)
    print("Agent started")

This function:

  • Connects to a LiveKit room
  • Subscribes to audio-only streams
  • Waits for a participant to join
  • Starts a multimodal AI-powered agent

Step 4: Running the Multimodal AI Agent

Finally, we define a multimodal agent that uses Deepgram’s API for speech recognition (STT), OpenAI’s GPT-4o-mini for the large language model (LLM), and Cartesia’s TTS for text-to-speech (TTS) to process voice interactions. You can also use other LLMs and TTS services - LiveKit supports a wide range of plugins.

def run_multimodal_agent(ctx: JobContext, participant: rtc.RemoteParticipant):
    print("Starting multimodal agent")

    initial_ctx = llm.ChatContext().append(
        role="system",
        text="You are a voice assistant created by Modal. You answer questions and help with tasks."
    )

    agent = VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-2-general"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(),
        chat_ctx=initial_ctx,
    )
    agent.start(ctx.room, participant)

Step 5: Running the LiveKit Worker

Finally, we define a Modal function that runs the LiveKit worker. We specify that we want to run this function (i.e. the LiveKit worker) with a GPU. We also specify a long timeout, so that the worker stays alive for the duration of the room.

@app.function(
    gpu="A100", timeout=3000, secrets=[Secret.from_name("livekit-voice-agent")]
)
async def run_agent_worker():
    import os
    print("Running worker")

    worker = Worker(
        WorkerOptions(
            entrypoint_fnc=livekit_entrypoint,
            ws_url=os.environ.get("LIVEKIT_URL"),
        )
    )
    await worker.run()

@app.local_entrypoint()
def main():
    run_agent_worker.remote()

With all this code in an app.py file, we can now run the worker by running modal run app.py.

Step 6: Spinning up a LiveKit frontend

LiveKit provides a frontend Sandbox that you can use to test your agent.

LiveKit Sandbox

Go to the LiveKit dashboard > Sandbox > Voice assistant. You should be able to instantiate a voice assistant frontend sandbox. Since you have deployed your agent with the appropriate LIVEKIT_URL, the frontend sandbox will automatically connect to your agent.

Conclusion

LiveKit Agents allows developers to build real-time voice assistants with minimal effort.

And the best way to deploy is with Modal!

Ship your first app in minutes.

Get Started

$30 / month free compute