Featured examples

Run large language models with a drop-in replacement for the OpenAI API.

Fine-tune an image generation model on pictures of your pet.

Run DeepSeek-R1 and Phi-4 on llama.cpp

Build an interactive voice chat app.

Serve Flux on Modal with a number of optimizations for blazingly fast inference.

Predict molecular structures from sequences with SotA open source models.

Run interactive language model applications.

Fine-tune a Wan2.1 video model on your face and run it in parallel

Turn prompts into music with MusicGen

Run an LLM coding agent that runs its own language models.

Use ColBERT-style, multimodal embeddings with a Vision-Language Model to answer questions about documents.

Prompt a generative video model to animate an image.

Build an end-to-end podcast transcription app that leverages dozens of containers for super-fast processing.

Serve a web UI for a protein model with ESM3, Molstar, and Gradio

Periodically post new Hacker News posts to Slack.

Build a question-answering web endpoint that can cite its sources.

Use Modal as an infinitely scalable job queue that can service async tasks from a web app.

Analyze data from the Taxi and Limousine Commission of NYC in parallel.