Sandboxes now GA, run LLM-generated code at scale! Learn more

Seamless Chat: Run multilingual chat rooms with SeamlessM4T-V2

Chat with friends from around the world without speaking the same language!

Introducing Seamless Chat, a speech-to-speech chat app that lets you chat with all your friends in their language at the same time.

Seamless Chat is powered by Meta’s SeamlessM4T-V2, a state-of-the-art multilingual speech-to-speech model that supports text and speech translation for over 20 languages. Thanks to Modal’s WebSocket support and distributed object stores, we’re able to create a scalable, low-latency chat interface.

Try out Seamless-Chat yourself here!

Seamless-Chat Bob Seamless-Chat Alice

Code Overview

Seamless Chat’s frontend and backend are both entirely deployed on Modal. The frontend is a standard SvelteKit app and the backend is a FastAPI server that handles WebSockets for chat connections and GPU accelerated inference. Let’s take a look at each of these in more detail!

Chat Backend - SeamlessM4T on GPUs

The code that powers Seamless Chat’s backend is defined with Modal’s class syntax and the @app.cls decorator. We define methods on that class for managing model loading, as described below. This allows us to manage multiple WebSocket connections in a shared container while managing the overhead of starting new containers. We can also specify the maximum number of concurrent connections with the allow_concurrent_inputs property.

@app.cls(
    gpu="H100",
    allow_concurrent_inputs=10,
    # and more
)
class SeamlessM4T:
    ...

With the @modal.build() decorator, we download the model into our container image, and with the @modal.enter() decorator, we load the model into memory once the container is instantiated.

@modal.build()
def build(self):
    snapshot_download("facebook/seamless-m4t-v2-large")

@modal.enter()
def enter(self):
    self.processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
    self.model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large").to("cuda")

Chat Backend - FastAPI Server

Modal makes it easy to define a ASGI server: just wrap the @modal.asgi_app() decorator around a function that returns a FastAPI app.

The main component of our server is our WebSocket endpoint for handling chat connections. Each socket connection needs to listen for incoming messages from the user along with outgoing messages from other users in the room. Using asyncio, we can handle both of these tasks concurrently, while also gracefully handling disconnections and errors.

@app.websocket("/chat")
async def chat(websocket: WebSocket):
    await websocket.accept()

    async def recv_loop():
        while True:
            # fetch incoming messages from user's websocket
            message = await websocket.receive_json()
            ...

    async def send_loop():
        while True:
            # fetch outgoing messages from other users in the room
            ...
    try:
        tasks = [
            asyncio.create_task(send_loop()),
            asyncio.create_task(recv_loop()),
        ]
        await asyncio.gather(*tasks)
    except WebSocketDisconnect:
        print(f"Socket disconnected: {user_id}")
        await websocket.close(code=1000)
    except Exception as e:
        print(f"Socket error: {e}")
        await websocket.close(code=1011)
    finally:
        for task in tasks:
            task.cancel()
        await asyncio.gather(*tasks, return_exceptions=True)

Chat Backend - Distributed Queues

As our application scales up, we may have multiple socket connections across different containers, so we need some way to send and synchronize messages between users. To handle this, we can take advantage of Modal’s distributed queues.

We can define a Modal Queue to store messages, with a separate FIFO partition for each user. When a user sends a message, we append it to the partition associated with each member of the room. The send_loop method repeatedly fetches new messages from their partition and translates the messages into their target language. In an asynchronous context, we simply use the .aio() function suffix to fetch messages from the queue. Finally, we pass the messages through the translation model and send the response back to the user.


message_queue = modal.Queue.from_name("seamless-message-queue")

async def send_loop():
    while True:
        message = await message_queue.get.aio(partition=user_id)

        text, audio_array = self.translate(message, src_lang, tgt_lang)

        message_data = {
            "messageId": message["message_id"],
            "userId": message["user_id"],
            "userName": message["user_name"],
            "lang": src_lang,
            "text": text,
            "audio": audio_array.tolist(),
        }

        await websocket.send_json(message_data)

Chat Frontend - SvelteKit

Seamless Chat’s frontend is a simple static SvelteKit application. We define a Modal function that is called through a web_endpoint with the same @modal.asgi_app() decorator. The function simply serves the frontend’s static files in the frontend/build directory after running npm run build in the frontend directory.

@app.function(
    mounts=[modal.Mount.from_local_dir("some/static/path", remote_path="/assets")],
    # and more
)
@modal.asgi_app(custom_domains=["seamless.modal.chat"])
def frontend():
    web_app = FastAPI()
    web_app.mount("/", StaticFiles(directory="/assets", html=True))

    return web_app

Deploy

To deploy Seamless Chat, you can simply clone the repository, compile the frontend assets and run modal deploy. Make sure you have the latest versions of npm and the modal client installed.

git clone https://github.com/modal-labs/seamless-chat

cd frontend
npm run build
cd ..

modal deploy seamless.py