Scaling out

Modal has a few different tools that helps with increasing performance of your applications.

Parallel execution of inputs

If your code is running the same function repeatedly with different independent inputs (e.g., a grid search), the easiest way to increase performance is to run those function calls in parallel using Modal’s method.

Here is an example if we had a function evaluate_model that takes a single argument:

import modal

stub = modal.Stub()

def evaluate_model(x):

if __name__ == "__main__":
        inputs = list(range(100))
        for result in  # runs many inputs in parallel

In this example, evaluate_model will be called with each of the 100 inputs (the numbers 0 - 99 in this case) roughly in parallel and the results are returned as an iterable with the results ordered in the same way as the inputs.

Out of order results and flatmap

Besides Modal functions, you can also use .map() on a Modal generator (created with stub.generator instead of stub.function). Each output from the generators (one generator will be created per input) will then be returned as they are created. This means the outputs will not necessarily come in the same order as the inputs. Since a generator can yield zero or more results, the number of outputs will not necessarily match the number of inputs either, like a “flat map”.


If your function takes multiple variable arguments, you can either use with one input iterator per argument, or Function.starmap() with a single input iterator containing sequences (like tuples) that can be spread over the arguments. This works similarly to Python’s built in map and itertools.starmap.


Note that .map() is a method on the modal function object itself, so you don’t explicitly call the function.

Incorrect usage:

results = evaluate_model(inputs).map()

Modal’s map is also not the same as using Python’s builtin map(). While the following will technically work, it will execute all inputs in sequence rather than in parallel.

Incorrect usage:

results = map(evaluate_model, inputs)

Asynchronous usage

All Modal APIs are available in both blocking and asynchronous variants. If you are comfortable with asynchronous programming, you can use it to create arbitrary parallel execution patterns, with the added benefit that any Modal functions will be executed remotely. See the async guide or the examples for more information about asynchronous usage.

GPU acceleration

Sometimes you can speed up your applications by utilizing GPU acceleration. See the gpu section for more information.

Limiting concurrency

If you want to limit concurrency, you can use the concurrency_limit argument to stub.function. For instance:

stub = modal.Stub()

def f(x):

With this, Modal will run at most 5 concurrent functions at any point.