Cold start performance
For a
deployed function
or a web endpoint, Modal will spin up as many containers
as needed to handle the current number of concurrent requests. Starting up
containers incurs a cold-start time of ~1s. Any logic in global scope (such as
imports) and the container
__enter__
function will be executed next.
In the case of loading large models from the image, this can take a few seconds
depending on the size of the model, because the file will be copied over the
network to the worker running your job.
After the cold-start, subsequent requests to the same container will see lower
response latency (~50-200ms), until the container is shut down after a
period of inactivity. Modal
currently exposes two parameters to control how many cold starts users
experience: container_idle_timeout
and keep_warm
.
Container idle timeout
By default, Modal containers spin down after 60 seconds of inactivity. This can
be overridden explicitly by setting the container_idle_timeout
value on the
@function
decorator. This can be set to
any integer value between 2 and 1200, and is measured in seconds.
import modal
stub = modal.Stub()
@stub.function(container_idle_timeout=300)
def my_idle_f():
return {"hello": "world"}
Warm pool
If you want to have some containers running at all times to mitigate the
cold-start penalty, you could set the keep_warm
value on the
@function
decorator. This configures a
given minimum number of containers that will always be up for your function, but
Modal will still scale up (and spin down) more containers if the demand for your
function exceeds the keep_warm
value, as usual.
from modal import Stub, web_endpoint
stub = Stub()
@stub.function(keep_warm=3)
@web_endpoint()
def my_warm_f():
return {"hello": "world"}
Functions with slow start-up and keep_warm
The guarantee that keep_warm
provides is that there are always at least n
containers up that have finished starting up. If your function does expensive /
slow initialization the first time it receives an input (e.g. if you use a
pre-trained model, and this model needs to be loaded into memory the first time
you use it), you’d observe that those function calls will still be slow.
To avoid this, you can use a container enter method to perform the expensive initialization. This will ensure that the initialization is performed before the container is deemed ready for the warm pool.