Scaling out

Modal has a few different tools that helps with increasing performance of your applications.

Parallel execution of inputs

If your code is running the same function repeatedly with different independent inputs (e.g., a grid search), the easiest way to increase performance is to run those function calls in parallel using Modal’s method.

Here is an example if we had a function evaluate_model that takes a single argument:

import modal

app = modal.App()

def evaluate_model(x):

def main():
    inputs = list(range(100))
    for result in  # runs many inputs in parallel

In this example, evaluate_model will be called with each of the 100 inputs (the numbers 0 - 99 in this case) roughly in parallel and the results are returned as an iterable with the results ordered in the same way as the inputs.


By default, if any of the function calls raises an exception, the exception will be propagated. To treat exceptions as successful results and aggregate them in the results list, pass in return_exceptions=True.

def my_func(a):
    if a == 2:
        raise Exception("ohno")
    return a ** 2

def main():
    print(list(, return_exceptions=True)))
    # [0, 1, UserCodeException(Exception('ohno'))]


If your function takes multiple variable arguments, you can either use with one input iterator per argument, or Function.starmap() with a single input iterator containing sequences (like tuples) that can be spread over the arguments. This works similarly to Python’s built in map and itertools.starmap.

def my_func(a, b):
    return a + b

def main():
    assert list(my_func.starmap([(1, 2), (3, 4)])) == [3, 7]


Note that .map() is a method on the modal function object itself, so you don’t explicitly call the function.

Incorrect usage:

results = evaluate_model(inputs).map()

Modal’s map is also not the same as using Python’s builtin map(). While the following will technically work, it will execute all inputs in sequence rather than in parallel.

Incorrect usage:

results = map(evaluate_model, inputs)

Asynchronous usage

All Modal APIs are available in both blocking and asynchronous variants. If you are comfortable with asynchronous programming, you can use it to create arbitrary parallel execution patterns, with the added benefit that any Modal functions will be executed remotely. See the async guide or the examples for more information about asynchronous usage.

GPU acceleration

Sometimes you can speed up your applications by utilizing GPU acceleration. See the gpu section for more information.

Limiting concurrency

If you want to limit concurrency, you can use the concurrency_limit argument to app.function. For instance:

app = modal.App()

def f(x):

With this, Modal will spin up at most 5 containers at any point.