Check out our new GPU Glossary! Read now

Language model inference

Run the latest open-source LLM and embedding models with Modal's serverless GPUs.
Get Started

“Modal's user-friendly interface and efficient tools have truly empowered our team to navigate data-intensive tasks with ease, enabling us to achieve our project goals more efficiently.”

Karim Atiyeh, Co-Founder & CTO

“Switched to Modal for our LLM inference instead of Azure. 1/4 the price for GPUs and so much simpler to set up/scale. Big fan.”

Alex Reichenbach, CEO

“Using Modal for inference is like having an extra infra team - it’s reliable, scalable, and fast - meaning I can get back to training models”

Vik Paruchari, Founder

GPUs on demand




View Examples

Blazing-fast performance




View Examples

Best-in-class developer experience




Try it out

Ship your first app in minutes.

Get Started

$30 / month free compute