modal.concurrent
Decorator that allows individual containers to handle multiple inputs concurrently.
The concurrency mechanism depends on whether the function is async or not:
- Async functions will run inputs on a single thread as asyncio tasks.
- Synchronous functions will use multi-threading. The code must be thread-safe.
Input concurrency will be most useful for workflows that are IO-bound (e.g., making network requests) or when running an inference server that supports dynamic batching.
When target_inputs is set, Modal’s autoscaler will try to provision resources
such that each container is running that many inputs concurrently, rather than
autoscaling based on max_inputs. Containers may burst up to up to max_inputs if resources are insufficient to remain at the target concurrency, e.g. when the
arrival rate of inputs increases. This can trade-off a small increase in average
latency to avoid larger tail latencies from input queuing.
Examples:
Added in v0.73.148: This decorator replaces the allow_concurrent_inputs parameter
in @app.function() and @app.cls().