Featured Examples

Deploy an OpenAI-compatible LLM service

Run large language models with a drop-in replacement for the OpenAI API

Custom pet art from Flux with Hugging Face and Gradio

Fine-tune an image generation model on pictures of your pet

Run llama.cpp

Run DeepSeek-R1 and Phi-4 on llama.cpp

Sandbox a LangGraph agent's code

Run an LLM coding agent that runs its own language models

Transcribe speech in batches with Whisper

Turn audio bytes into text at scale

Voice chat with LLMs

Build an interactive voice chat app

Edit images with Flux Kontext

Transform images with SotA diffusion models

Fold proteins with Boltz-2

Predict molecular structures and binding affinities from sequences with SotA open source models

Serverless WebRTC

Stream YOLO detections on webcam footage in real time

Serve diffusion models

Serve Flux on Modal with optimizations for blazingly fast inference

Serverless TensorRT-LLM (LLaMA 3 8B)

Run interactive language model applications

Transcribe speech with Kyutai STT

Stream transcripts at the speed of speech

Star in custom music videos

Fine-tune a Wan2.1 video model on your face and run it in parallel

Create music

Turn prompts into music with MusicGen

RAG Chat with PDFs

Use ColBERT-style, multimodal embeddings with a Vision-Language Model to answer questions about documents

Bring images to life

Prompt a generative video model to animate an image

Fast podcast transcriptions

Build an end-to-end podcast transcription app that leverages dozens of containers for super-fast processing

Build a protein folding dashboard

Serve a web UI for a protein model with ESM3, Molstar, and Gradio

Deploy a Hacker News Slackbot

Periodically post new Hacker News posts to Slack

Fold proteins with Chai-1

Predict molecular structures from sequences with SotA open source models

Retrieval-Augmented Generation (RAG) for Q&A

Build a question-answering web endpoint that can cite its sources

Document OCR job queue

Use Modal as an infinitely scalable job queue that can service async tasks from a web app

Parallel processing of Parquet files on S3

Analyze data from the Taxi and Limousine Commission of NYC in parallel