GPU acceleration

If you have code or use libraries that are GPU accelerated, you can attach the first available GPU to your function by passing the gpu="any" argument to the @stub.function decorator:

import modal

stub = modal.Stub()

def my_function():
    # code here will be executed on a machine with an available GPU

Specifying GPU type

When gpu="any" is specified, your function runs in a container with access to a supported GPU. Currently this gives you Nvidia Tesla T4 or A10G instances. If you need more control, you can pick a specific GPU type by changing this argument.

def my_t4_function():

def my_a10g_function():

Specifying GPU count

You may also specify the number of GPUs to attach to your function by using the object form of the gpu argument for your desired GPU:

def my_a10g_function():

Currently A10G and T4 instances both support up to 4 GPUs, but we are working on supporting more options in the future.

A100 GPUs

Modal also supports A100 GPUs, which are NVIDIA’s flagship data center chip. They have beefier hardware and more GPU memory. However, while in the limited availability phase, you may occasionally run into larger queue times to get access to them.

To request an A100 with 40 GB of GPU memory, replace the gpu="any" argument with gpu="A100":

def my_a100_function():

A100 (20 GB VRAM)

A100s are also available in 20 GB variants, which can be used by creating a gpu.A100 object with the memory parameter. These are cheaper than the 40 GB VRAM variant.

def my_a100_function():

Cloud provider and availability

In addition to longer queue times, there are some things to keep in mind when using A100s:

  • Modal A100 workers currently run on a separate cloud provider (GCP) vs the rest of Modal’s infrastructure. This means that the first time you start up an image with an A100 GPU, there will be an additional latency cost as we transfer files between cloud providers. However, subsequent runs for that image (including cold starts) will be just as fast as any other Modal function.
  • A100 workers are in high demand and may experience an increased rate of preemption compared with other GPU and CPU-only workers. If this happens, in-progress inputs will be rescheduled. See the Preemption guide for more details.

We’re actively working on addressing these constraints, so stay tuned!

Co-locating CPU-only functions

Note that any SharedVolumes attached to function with A100s will be created in GCP (if they don’t already exist in AWS). It might be desirable to run your CPU-only functions in the same cloud region to avoid latency and bandwidth costs. This is currently supported by setting the cloud parameter to gcp in the @function decorator.

def my_cpu_function():


Take a look at some of our examples that use GPUs: