Cold start performance

For a deployed function or a web endpoint, Modal will spin up as many containers as needed to handle the current number of concurrent requests. Starting up containers incurs a cold-start time of ~1s. Any logic in global scope (such as imports) and the container enter function will be executed next. In the case of loading large models from the image, this can take a few seconds depending on the size of the model, because the file will be copied over the network to the worker running your job.

After the cold-start, subsequent requests to the same container will see lower response latency (~50-200ms), until the container is shut down after a period of inactivity. Modal currently exposes two parameters to control how many cold starts users experience: container_idle_timeout and keep_warm.

Container idle timeout

By default, Modal containers spin down after 60 seconds of inactivity. This can be overridden explicitly by setting the container_idle_timeout value on the @function decorator. This can be set to any integer value between 2 and 1200, and is measured in seconds.

import modal

stub = modal.Stub()

@stub.function(container_idle_timeout=300)
def my_idle_f():
    return {"hello": "world"}

Warm pool

If you want to have some containers running at all times to mitigate the cold-start penalty, you could set the keep_warm value on the @function decorator. This configures a given minimum number of containers that will always be up for your function, but Modal will still scale up (and spin down) more containers if the demand for your function exceeds the keep_warm value, as usual.

from modal import Stub, web_endpoint

stub = Stub()

@stub.function(keep_warm=3)
@web_endpoint()
def my_warm_f():
    return {"hello": "world"}

Functions with slow start-up and keep_warm

The guarantee that keep_warm provides is that there are always at least n containers up that have finished starting up. If your function does expensive / slow initialization the first time it receives an input (e.g. if you use a pre-trained model, and this model needs to be loaded into memory the first time you use it), you’d observe that those function calls will still be slow.

To avoid this, you can use a container enter method to perform the expensive initialization. This will ensure that the initialization is performed before the container is deemed ready for the warm pool.

Memory checkpointing

Checkpointing is a developer preview feature that can significantly reduce cold start times. Refer to the page Checkpointing for details.