Connecting Modal to your Vercel account

You can use the Modal + Vercel integration to access Modal’s Instant Endpoints from Vercel projects. You’ll find the Modal Vercel integration available for install in the Vercel AI marketplace.

What this integration does

This integration allows you to:

Easily synchronize your Modal API keys to one or more Vercel projects
Call Modal’s Instant Endpoints over HTTP in connected Vercel projects

Authentication

The integration will set the following environment variables against the user’s selected Vercel projects:

MODAL_TOKEN_ID (starts with ak-*)
MODAL_TOKEN_SECRET (starts with as-*)

The environment variables will be set in the “preview” and “production” project targets. You can read more about environment variables within Vercel in the documentation.

Installing the integration

Click “Add integration” on the Vercel integrations page
Select the Vercel account you want to connect with
(If logged out) Sign into an existing Modal workspace, or create a new Modal workspace
Select the Vercel projects that you wish to connect to your Modal workspace
Click “Continue”
Back in your Vercel dashboard, confirm the environment variables were added by going to your Vercel project > "Settings" > "Environment variables"

Uninstalling the integration

The Modal Vercel integration is managed under the user’s Vercel dashboard under the “Integrations” tab. From there they can remove the specific integration installation from their Vercel account.

Important: removing an integration will delete the corresponding API token set by Modal in your Vercel project(s).

Instant Endpoints are a fast and scalable API for integrating open-source AI models into your Vercel app.

All available endpoints are listed below, along with example code suitable for use with the Javascript fetch API.

Stable Diffusion XL

https://modal-labs—instant-stable-diffusion-xl.modal.run

Stable Diffusion is a latent text-to-image diffusion model able to generate photo-realistic images given any text prompt.

This endpoint uses a fast version of Stable Diffusion XL to create variably sized images up to 1024h x 1024w.

Example code

// pages/api/modal.ts
const requestData = {
  prompt: "need for speed supercar. unreal engine",
  width: 768,
  height: 768,
  num_outputs: 1,
};
const result = await fetch(
  "https://modal-labs--instant-stable-diffusion-xl.modal.run/v1/inference",
  {
    headers: {
      Authorization: `Token ${process.env.MODAL_TOKEN_ID}:${process.env.MODAL_TOKEN_SECRET}`,
      "Content-Type": "application/json",
    },
    method: "POST",
    body: JSON.stringify(requestData),
  },
);
const imageData = await result.blob();

Input schema

prompt string
- Input prompt
height integer
- Height of generated image in pixels. Needs to be a multiple of 64
- One of: 64, 128, 192, 256, 320, 384, 448, 512, 576, 640, 704, 768, 832, 896, 960, 1024
- Default: 768
width integer
- Width of generated image in pixels. Needs to be a multiple of 64
- One of: 64, 128, 192, 256, 320, 384, 448, 512, 576, 640, 704, 768, 832, 896, 960, 1024
- Default: 768

Output schema

This endpoint outputs bytes for a single image with media type "image/png".

Pricing

Requests to this endpoint use duration based pricing, billed at 1ms granularity. The exact cost per millisecond is based on the underlying GPU hardware. This endpoint use a single NVIDIA A10G device to serve each request.

See our pricing page for current GPU prices.

Inferences usually complete within 15-30 seconds.

Transcribe speech with vaibhavs10/insanely-fast-whisper

This endpoint hosts vaibhavs10/insanely-fast-whisper to transcribe and diarize audio.

Example code

// pages/api/modal.ts
const data = {
  audio: dataUrl,
  diarize_audio: false,
};

const response = await fetch("https://modal-labs--instant-whisper.modal.run", {
  headers: {
    Authorization: `Token ${process.env.MODAL_TOKEN_ID}:${process.env.MODAL_TOKEN_SECRET}`,
  },
  method: "POST",
  body: JSON.stringify(requestData),
});

const output = await response.json();

Input schema

audio string
- Input audio file as a Data URL.
language string
- Language of the input text. Whisper auto-detects the language if not provided. See the full list of options here
- Default: “
diarize_audio Boolean
- Whether to diarize the audio.
- Default: false
batch_size integer
- Number of parallel batches.
- Default: 24

Output schema

This endpoint outputs a JSON with two fields:

text string
chunks Chunk[]

Here, Chunk is a JSON object with the following fields:

speaker string
- [Optional] only present if diarize_audio is true
text string
timestamp [float, float]

Stream text-to-speech with coqui-ai/TTS

XTTS v2 is a fast and high-quality text-to-speech model.

This endpoint uses a streaming version of coqui-ai/TTS that streams wav audio back as it’s generated in real-time.

Example code

// pages/api/modal.ts
const requestData = {
  text: "It is a mistake to think you can solve any major problems just with potatoes.",
  language: "en",
};
const result = await fetch("https://modal-labs--instant-xtts-v2.modal.run", {
  headers: {
    Authorization: `Token ${process.env.MODAL_TOKEN_ID}:${process.env.MODAL_TOKEN_SECRET}`,
    "Content-Type": "application/json",
  },
  method: "POST",
  body: JSON.stringify(requestData),
});
const audioBuffer = await response.buffer();

Input schema

text string
- Input text
language string
- Language of the input text
- One of: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh, hu, ko, hi
- Default: en

Output schema

This endpoint streams bytes for a single audio file with media type "audio/wav".

Want more?

If a popular open-source AI model API is not listed here, you can either implement it in Python and host it on Modal or ask us in Slack to add it as an Instant Endpoint!

Connecting Modal to your Vercel account

What this integration does

Authentication

Installing the integration

Uninstalling the integration

Modal Instant Endpoints

Stable Diffusion XL

Example code

Input schema

Output schema

Pricing

Transcribe speech with vaibhavs10/insanely-fast-whisper

Example code

Input schema

Output schema

Stream text-to-speech with coqui-ai/TTS

Example code

Input schema

Output schema

Want more?