Join us at the Late Shift after-party at AWS re:Invent on December 3. Register

Modal Training

Train more, configure less

Launch more experiments and training jobs. Spin up single-node experiments or scale to multi-node GPU training instantly.

Get Started Read the docs

“Modal lets us deploy new ML models in hours rather than weeks. We use it across spam detection, recommendations, audio transcription, and video pipelines, and it’s helped us move faster with far less complexity.”

Mike Cohen, Head of AI & ML Engineering

“Modal's user-friendly interface and efficient tools have truly empowered our team to navigate data-intensive tasks with ease, enabling us to achieve our project goals more efficiently.”

Karim Atiyeh, Co-Founder & CTO

DEFINE IN CODE

NATIVE STORAGE

SUB-SECOND STARTUP

Modal Training

Where researchers can run
experiments, not ops

Define in code

Define your training function with Modal’s SDK. Easily keep ML dependencies and GPU requirements in sync with application code.

image = (

    modal.Image.from_registry(

        f"nvidia/cuda:{tag}"

    .uv_pip_install(

        "accelerate",

        "torch",

@app.function(gpu="B200:8", image=image)

@modal.clustered(size=4, rdma=True)

def train_multi_node():

...

Native storage

Ingest training data from anywhere: Modal’s distributed Volumes, cloud buckets, or your local filesystem.

volume = modal.Volume.from_name(

    "training_data_vol"

@app.function(

    volumes={

        "/my-s3-mount": modal.CloudBucketMount(

            "training_data_s3",

            secret=secret,

),

        "/my-volume": volume,

def train():

...

Sub-second startup

Modal’s container stack launches GPUs for your function in < 1s. Fan out experiments to accelerate your research.

Speed up training jobs by going multi-node

Scale from 1 GPU to 64 with just one line of code

Spin up a cluster in a second with no minimum commitments

B200, H200, and H100 clusters equipped with Infiniband and private networking

Speed up training jobs by going multi-node

Scale from 1 GPU to 64 with just one line of code

Spin up a cluster in a second with no minimum commitments

B200, H200, and H100 clusters equipped with Infiniband and private networking

No black boxes. You control the training logic.

Any base model

Use Qwen, Flux, Whisper, your own custom model, or train from scratch.

Any training framework

PyTorch, Axolotl, Unsloth, Hugging Face TRL, and more.

Any MLOps framework

Weights and Biases, TensorBoard, and more.

Built with Modal

All examples

Custom pet art from Flux with Hugging Face and Gradio

Fine-tune an image generation model on pictures of your pet

Star in custom music videos

Fine-tune a Wan2.1 video model on your face and run it in parallel

Fine-Tuning and Inference for Computer Vision with YOLO

Customize and deploy lightning-fast object detection models

Fine-tune Flux with LoRA

Fine-tune diffusion models like Flux using Low-Rank Adaptation (LoRA)

Fine-tune Whisper on domain vocab

Improve Whisper transcription accuracy on specialized vocabularies with fine-tuning

Fine-tune Qwen3 with Unsloth

Use Unsloth to fine-tune the Qwen3 language model efficiently

Fine-tune Llama 3.1 with torchtune

Customize Llama 3.1 with torchtune for your downstream applications

Train an LLM with GRPO

Apply Grouped Reinforcement Policy Optimization (GRPO) to train large language models

Parallelize a hyperparameter sweep

Run large-scale hyperparameter sweeps in parallel on Modal

Your end-to-end ML lifecyle in one place

Seamlessly integrate data pre-processing, training, and serving.

Ship your first app in minutes.

Get Started

$30 / month free compute