Connecting Modal to your Vercel account
You can use the Modal + Vercel integration to access Modal’s Instant Endpoints from Vercel projects. You’ll find the Modal Vercel integration available for install in the Vercel AI marketplace.
What this integration does
This integration allows you to:
- Easily synchronize your Modal API keys to one or more Vercel projects
- Call Modal’s Instant Endpoints over HTTP in connected Vercel projects
Authentication
The integration will set the following environment variables against the user’s selected Vercel projects:
MODAL_TOKEN_ID
(starts withak-*
)MODAL_TOKEN_SECRET
(starts withas-*
)
The environment variables will be set in the “preview” and “production” project targets. You can read more about environment variables within Vercel in the documentation.
Installing the integration
- Click “Add integration” on the Vercel integrations page
- Select the Vercel account you want to connect with
- (If logged out) Sign into an existing Modal workspace, or create a new Modal workspace
- Select the Vercel projects that you wish to connect to your Modal workspace
- Click “Continue”
- Back in your Vercel dashboard, confirm the environment variables were added
by going to your Vercel
project > "Settings" > "Environment variables"
Uninstalling the integration
The Modal Vercel integration is managed under the user’s Vercel dashboard under the “Integrations” tab. From there they can remove the specific integration installation from their Vercel account.
Important: removing an integration will delete the corresponding API token set by Modal in your Vercel project(s).
Modal Instant Endpoints
Instant Endpoints are a fast and scalable API for integrating open-source AI models into your Vercel app.
All available endpoints are listed below, along with example code suitable for
use with the Javascript fetch
API.
Stable Diffusion XL
Stable Diffusion is a latent text-to-image diffusion model able to generate photo-realistic images given any text prompt.
This endpoint uses a fast version of Stable Diffusion XL to create variably sized images up to 1024h x 1024w.
Example code
// pages/api/modal.ts
const requestData = {
prompt: "need for speed supercar. unreal engine",
width: 768,
height: 768,
num_outputs: 1,
};
const result = await fetch(
"https://modal-labs--instant-stable-diffusion-xl.modal.run/v1/inference",
{
headers: {
Authorization: `Token ${process.env.MODAL_TOKEN_ID}:${process.env.MODAL_TOKEN_SECRET}`,
"Content-Type": "application/json",
},
method: "POST",
body: JSON.stringify(requestData),
},
);
const imageData = await result.blob();
Input schema
- prompt
string
- Input prompt
- height
integer
- Height of generated image in pixels. Needs to be a multiple of 64
- One of:
64, 128, 192, 256, 320, 384, 448, 512, 576, 640, 704, 768, 832, 896, 960, 1024
- Default:
768
- width
integer
- Width of generated image in pixels. Needs to be a multiple of 64
- One of:
64, 128, 192, 256, 320, 384, 448, 512, 576, 640, 704, 768, 832, 896, 960, 1024
- Default:
768
Output schema
This endpoint outputs bytes
for a single image with
media type "image/png"
.
Pricing
Requests to this endpoint use duration based pricing, billed at 1ms granularity. The exact cost per millisecond is based on the underlying GPU hardware. This endpoint use a single NVIDIA A10G device to serve each request.
See our pricing page for current GPU prices.
Inferences usually complete within 15-30 seconds.
Transcribe speech with vaibhavs10/insanely-fast-whisper
This endpoint hosts vaibhavs10/insanely-fast-whisper to transcribe and diarize audio.
Example code
// pages/api/modal.ts
const data = {
audio: dataUrl,
diarize_audio: false,
};
const response = await fetch("https://modal-labs--instant-whisper.modal.run", {
headers: {
Authorization: `Token ${process.env.MODAL_TOKEN_ID}:${process.env.MODAL_TOKEN_SECRET}`,
},
method: "POST",
body: JSON.stringify(requestData),
});
const output = await response.json();
Input schema
- audio
string
- Input audio file as a Data URL.
- language
string
- Language of the input text. Whisper auto-detects the language if not provided. See the full list of options here
- Default: “
- diarize_audio
Boolean
- Whether to diarize the audio.
- Default:
false
- batch_size
integer
- Number of parallel batches.
- Default:
24
Output schema
This endpoint outputs a JSON with two fields:
- text
string
- chunks
Chunk[]
Here, Chunk
is a JSON object with the following fields:
- speaker
string
- [Optional] only present if
diarize_audio
istrue
- [Optional] only present if
- text
string
- timestamp
[float, float]
Stream text-to-speech with coqui-ai/TTS
XTTS v2 is a fast and high-quality text-to-speech model.
This endpoint uses a streaming version of coqui-ai/TTS that streams wav audio back as it’s generated in real-time.
Example code
// pages/api/modal.ts
const requestData = {
text: "It is a mistake to think you can solve any major problems just with potatoes.",
language: "en",
};
const result = await fetch("https://modal-labs--instant-xtts-v2.modal.run", {
headers: {
Authorization: `Token ${process.env.MODAL_TOKEN_ID}:${process.env.MODAL_TOKEN_SECRET}`,
"Content-Type": "application/json",
},
method: "POST",
body: JSON.stringify(requestData),
});
const audioBuffer = await response.buffer();
Input schema
- text
string
- Input text
- language
string
- Language of the input text
- One of:
en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh, hu, ko, hi
- Default:
en
Output schema
This endpoint streams bytes
for a single audio file with
media type "audio/wav"
.
Want more?
If a popular open-source AI model API is not listed here, you can either implement it in Python and host it on Modal or ask us in Slack to add it as an Instant Endpoint!