GPU acceleration
If you have code or use libraries that are GPU accelerated, you can attach the
first available GPU to your function by passing the gpu="any"
argument to the
@stub.function
decorator:
import modal
stub = modal.Stub()
@stub.function(gpu="any")
def my_function():
# code here will be executed on a machine with an available GPU
...
Specifying GPU type
When gpu="any"
is specified, your function runs in a container with access to
a supported GPU. Currently this gives you Nvidia
Tesla T4 or
A10G instances. If
you need more control, you can pick a specific GPU type by changing this
argument.
@stub.function(gpu="T4")
def my_t4_function():
...
@stub.function(gpu="A10G")
def my_a10g_function():
...
For information on all valid values for the gpu
parameter see
the modal.gpu
reference page.
Specifying GPU count
You may also specify the number of GPUs to attach to your function by using the
object form of the gpu
parameter for your desired
GPU:
@stub.function(gpu=modal.gpu.A10G(count=2))
def my_a10g_function():
...
Currently A100, A10G and T4 instances all support up to 4 GPUs, but we are working on supporting more options in the future. There will be increased startup times for functions requesting more than 2 GPUs.
A100 GPUs
Modal also supports A100 GPUs, which are NVIDIA’s flagship data center chip. They have beefier hardware and more GPU memory. However, while in the limited availability phase, you may occasionally run into larger queue times to get access to them.
To request an A100 with 40 GB of GPU memory, replace the gpu="any"
argument
with gpu="A100"
:
@stub.function(gpu="A100")
def my_a100_function():
...
A100 (20 GB VRAM)
A100s are also available in 20 GB variants, which can be used by creating a
gpu.A100
object with the memory
parameter. These are cheaper than the 40 GB VRAM variant.
@stub.function(gpu=modal.gpu.A100(memory=20))
def my_a100_function():
...
Cloud provider and availability
In addition to longer queue times, there are some things to keep in mind when using A100s:
- Modal A100 workers currently run on a separate cloud provider (GCP) vs the rest of Modal’s infrastructure. This means that the first time you start up an image with an A100 GPU, there will be an additional latency cost as we transfer files between cloud providers. However, subsequent runs for that image (including cold starts) will be just as fast as any other Modal function.
- A100 workers are in high demand and may experience an increased rate of preemption compared with other GPU and CPU-only workers. If this happens, in-progress inputs will be rescheduled. See the Preemption guide for more details.
We’re actively working on addressing these constraints, so stay tuned!
Co-locating CPU-only functions
Note that any SharedVolume
s attached to function
with A100s will be created in GCP (if they don’t already exist in AWS). It might
be desirable to run your CPU-only functions in the same cloud region to avoid
latency and bandwidth costs. This is currently supported by setting the cloud
parameter to gcp
in the @function
decorator.
@stub.function(cloud="gcp")
def my_cpu_function():
...
Examples
Take a look at some of our examples that use GPUs: