Featured Examples

Deploy an OpenAI-compatible LLM service

Run large language models with a drop-in replacement for the OpenAI API.

Custom pet art from Flux with Hugging Face and Gradio

Fine-tune an image generation model on pictures of your pet.

Run llama.cpp

Run DeepSeek-R1 and Phi-4 on llama.cpp

Voice chat with LLMs

Build an interactive voice chat app.

Serve diffusion models

Serve Flux on Modal with a number of optimizations for blazingly fast inference.

Fold proteins with Chai-1

Predict molecular structures from sequences with SotA open source models.

Create music

Turn prompts into music with MusicGen

Sandbox a LangGraph agent's code

Run an LLM coding agent that runs its own language models.

RAG Chat with PDFs

Use ColBERT-style, multimodal embeddings with a Vision-Language Model to answer questions about documents.

Bring images to life

Prompt a generative video model to animate an image.

Fast podcast transcriptions

Build an end-to-end podcast transcription app that leverages dozens of containers for super-fast processing.

Build a protein folding dashboard

Serve a web UI for a protein model with ESM3, Molstar, and Gradio

Deploy a Hacker News Slackbot

Periodically post new Hacker News posts to Slack.

Retrieval-Augmented Generation (RAG) for Q&A

Build a question-answering web endpoint that can cite its sources.

Document OCR job queue

Use Modal as an infinitely scalable job queue that can service async tasks from a web app.

Parallel processing of Parquet files on S3

Analyze data from the Taxi and Limousine Commission of NYC in parallel.