modal.concurrent

def concurrent(
    *,
    max_inputs: Optional[int] = None,  # Hard limit on each container's input concurrency
    target_inputs: Optional[int] = None,  # Input concurrency that Modal's autoscaler should target
) -> Callable[
    [Union[Callable[P, ReturnType], _PartialFunction[P, ReturnType, ReturnType]]],
    _PartialFunction[P, ReturnType, ReturnType],
]:

Decorator that allows individual containers to handle multiple inputs concurrently.

The concurrency mechanism depends on whether the function is async or not:

Async functions will run inputs on a single thread as asyncio tasks.
Synchronous functions will use multi-threading. The code must be thread-safe.

Input concurrency will be most useful for workflows that are IO-bound (e.g., making network requests) or when running an inference server that supports dynamic batching.

When target_inputs is set, Modal’s autoscaler will try to provision resources such that each container is running that many inputs concurrently, rather than autoscaling based on max_inputs. Containers may burst up to up to max_inputs if resources are insufficient to remain at the target concurrency, e.g. when the arrival rate of inputs increases. This can trade-off a small increase in average latency to avoid larger tail latencies from input queuing.

Examples:

# Stack the decorator under `@app.function()` to enable input concurrency
@app.function()
@modal.concurrent(max_inputs=100)
async def f(data):
    # Async function; will be scheduled as asyncio task
    ...

# With `@app.cls()`, apply the decorator at the class level, not on individual methods
@app.cls()
@modal.concurrent(max_inputs=100, target_inputs=80)
class C:
    @modal.method()
    def f(self, data):
        # Sync function; must be thread-safe
        ...

Added in v0.73.148: This decorator replaces the allow_concurrent_inputs parameter in @app.function() and @app.cls().