Featured examples

Run large language models with a drop-in replacement for the OpenAI API

Fine-tune an image generation model on pictures of your pet

Run DeepSeek-R1 and Phi-4 on llama.cpp

Run an LLM coding agent that runs its own language models

Turn audio bytes into text at scale

Build an interactive voice chat app

Transform images with SotA diffusion models

Predict molecular structures and binding affinities from sequences with SotA open source models

Stream YOLO detections on webcam footage in real time

Serve Flux on Modal with optimizations for blazingly fast inference

Run interactive language model applications

Stream transcripts at the speed of speech

Fine-tune a Wan2.1 video model on your face and run it in parallel

Turn prompts into music with MusicGen

Use ColBERT-style, multimodal embeddings with a Vision-Language Model to answer questions about documents

Prompt a generative video model to animate an image

Build an end-to-end podcast transcription app that leverages dozens of containers for super-fast processing

Serve a web UI for a protein model with ESM3, Molstar, and Gradio

Periodically post new Hacker News posts to Slack

Predict molecular structures from sequences with SotA open source models

Build a question-answering web endpoint that can cite its sources

Use Modal as an infinitely scalable job queue that can service async tasks from a web app

Analyze data from the Taxi and Limousine Commission of NYC in parallel