modal.concurrent
def concurrent(
_warn_parentheses_missing=None,
*,
max_inputs: int, # Hard limit on each container's input concurrency
target_inputs: Optional[int] = None, # Input concurrency that Modal's autoscaler should target
) -> Callable[[Union[Callable[..., Any], _PartialFunction]], _PartialFunction]:
Decorator that allows individual containers to handle multiple inputs concurrently.
The concurrency mechanism depends on whether the function is async or not:
- Async functions will run inputs on a single thread as asyncio tasks.
- Synchronous functions will use multi-threading. The code must be thread-safe.
Input concurrency will be most useful for workflows that are IO-bound (e.g., making network requests) or when running an inference server that supports dynamic batching.
When target_inputs
is set, Modal’s autoscaler will try to provision resources
such that each container is running that many inputs concurrently, rather than
autoscaling based on max_inputs
. Containers may burst up to up to max_inputs
if resources are insufficient to remain at the target concurrency, e.g. when the
arrival rate of inputs increases. This can trade-off a small increase in average
latency to avoid larger tail latencies from input queuing.
Examples:
# Stack the decorator under `@app.function()` to enable input concurrency
@app.function()
@modal.concurrent(max_inputs=100)
async def f(data):
# Async function; will be scheduled as asyncio task
...
# With `@app.cls()`, apply the decorator at the class level, not on individual methods
@app.cls()
@modal.concurrent(max_inputs=100, target_inputs=80)
class C:
@modal.method()
def f(self, data):
# Sync function; must be thread-safe
...