Concurrency and rate limits

Modal has two mechanisms for limiting function execution.

Concurrency limit

You can limit the number of concurrent executions for a function:

@stub.function(concurrency_limit=3)
def my_concurrency_limited_function():
    pass

This restricts the maximum number of Modal containers that can run simultaneously for this function. Each Modal container handles a single input at a time.

Rate limit

You can also set a rate limit on a Modal function, which limits how many times a function will execute in a given time period. The rate limit counter is reset at the beginning of each time period. So for example, if you set a rate limit of 10 requests per minute, you can successfully call the function 10 times at 18:00:59 and then 10 times again at 18:01:00.

Currently, per_second and per_minute are the two interval lengths supported:

@stub.function(rate_limit=modal.RateLimit(per_second=2))
def per_second_limit():
    pass

@stub.function(rate_limit=modal.RateLimit(per_minute=5))
def per_minute_limit():
    pass