GLM-5 is available to try on Modal. Get started
September 15, 20245 minute read

How to run XTTS

author
Yiren Lu@YirenLu
Solutions Engineer

How to Run XTTS: A Step-by-Step Guide

XTTS is one of the best open-source text-to-speech models available today. It offers high-quality, multilingual speech synthesis capabilities. In this guide, we’ll walk you through the process of running XTTS using Modal, a serverless cloud computing platform.

Prerequisites

Before we begin, make sure you have the following:

  1. Create an account at modal.com
  2. Install the Modal Python package:
  3. Authenticate your Modal account:
    (If this doesn’t work, try python -m modal setup)

Setting Up the XTTS Environment

We’ll be using a single Python file to set up and run XTTS. Let’s break down the code and explain each part:

First, we import the necessary libraries and set up the Modal app:

Next, we define the image that will be used to run our XTTS model:

This image is based on Debian Slim, installs Git, and sets up the TTS package from the Coqui repository. Note that we’re agreeing to Coqui’s terms of service by setting the COQUI_TOS_AGREED environment variable.

Implementing the XTTS Class

Now, let’s create the XTTS class that will handle the text-to-speech conversion:

This class does the following:

  1. Loads the XTTS-v2 model when the container starts.
  2. Provides a speak method that converts text to speech.

Running XTTS

Finally, we define an entrypoint to run our XTTS model:

This entrypoint function takes a text input, runs the XTTS model, and saves the output as a WAV file.

How to Use the XTTS Script

To use this script:

  1. Save the entire code into a file, for example, xtts_modal.py.
  2. Run the script using Modal:

This will generate an output.wav file in your current directory containing the synthesized speech.

Conclusion

By following this guide, you’ve learned how to run XTTS using Modal. This setup allows you to leverage powerful GPU resources in the cloud for high-quality text-to-speech conversion. You can easily modify the script to support different languages or speakers.

For the full code and more details, you can check out the complete gist here.

Ship your first app in minutes.

Get Started

$30 / month free compute