Featured Examples
Deploy an OpenAI-compatible LLM service
Run large language models with a drop-in replacement for the OpenAI API
Custom pet art from Flux with Hugging Face and Gradio
Fine-tune an image generation model on pictures of your pet
Run llama.cpp
Run DeepSeek-R1 and Phi-4 on llama.cpp
Sandbox a LangGraph agent's code
Run an LLM coding agent that runs its own language models
Transcribe speech in batches with Whisper
Turn audio bytes into text at scale
Voice chat with LLMs
Build an interactive voice chat app
Edit images with Flux Kontext
Transform images with SotA diffusion models
Fold proteins with Boltz-2
Predict molecular structures and binding affinities from sequences with SotA open source models
Serverless WebRTC
Stream YOLO detections on webcam footage in real time
Serve diffusion models
Serve Flux on Modal with optimizations for blazingly fast inference
Serverless TensorRT-LLM (LLaMA 3 8B)
Run interactive language model applications
Transcribe speech with Kyutai STT
Stream transcripts at the speed of speech
Star in custom music videos
Fine-tune a Wan2.1 video model on your face and run it in parallel
Create music
Turn prompts into music with MusicGen
RAG Chat with PDFs
Use ColBERT-style, multimodal embeddings with a Vision-Language Model to answer questions about documents
Bring images to life
Prompt a generative video model to animate an image
Fast podcast transcriptions
Build an end-to-end podcast transcription app that leverages dozens of containers for super-fast processing
Build a protein folding dashboard
Serve a web UI for a protein model with ESM3, Molstar, and Gradio
Deploy a Hacker News Slackbot
Periodically post new Hacker News posts to Slack
Fold proteins with Chai-1
Predict molecular structures from sequences with SotA open source models
Retrieval-Augmented Generation (RAG) for Q&A
Build a question-answering web endpoint that can cite its sources
Document OCR job queue
Use Modal as an infinitely scalable job queue that can service async tasks from a web app
Parallel processing of Parquet files on S3
Analyze data from the Taxi and Limousine Commission of NYC in parallel