GLM-5 is available to try on Modal. Get started
February 28, 20255 minute read

Cold start ComfyUI in less than 3 seconds with memory snapshotting

author
Kenny Ning@kenny_ning
Growth Engineer
author
Tolga Oguz
Software Engineer

Update: We’ve directly folded memory snapshotting into our official ComfyUI example. We recommend starting there if you want to use memory snapshotting with ComfyUI.

In this post, we show how Modal’s memory snapshotting feature can drive ComfyUI cold starts down from 10-15 seconds to less than 3 seconds on average.

ComfyUI cold start (seconds)

The problem: high ComfyUI cold starts

Cold start is the time it takes for a container to start up.

In the context of ComfyUI on Modal, cold start is the time it takes for the ComfyUI server to launch (i.e. running ComfyUI main.py).

Let’s break down the cold start for our ComfyUI example.

comfyui-cold-start

Note that loading a checkpoint (e.g. flux, stable diffusion) does not happen at cold start, but rather on the first workflow execution.

Custom node prestartup scripts

First, we run some prestartup scripts for ComfyUI manager, which includes a security scan (to make sure none of our custom nodes are malicious), importing standard libraries, and ensuring we have necessary dependencies.

Import torch

Importing torch is by far the heaviest part of starting up the ComfyUI server; it’s executing 26,000 syscalls after all.

Initialize custom nodes

Lastly, we initialize our custom nodes. Specifically, this is the time it takes for init_external_custom_nodes() to run, which loops through each custom node directory and imports the relevant modules. For customers running workflows with many custom nodes, this step is often the biggest cold start driver.

Solution: memory snapshotting

Modal offers an advanced feature called memory snapshots that allows you to snapshot your container state right after starting up. Then when subsequent containers start up they can restore from this snapshot and improve cold start time by 1.5-3x.

One complication of using memory snapshots is that you can only snapshot CPU memory. If you try to snapshot the ComfyUI server start step with a GPU attached, memory snapshot will not work.

Luckily, community member Tolga Oguz came up with a great solution to override core ComfyUI and skip the CUDA device checks on server start. We don’t need the GPU enabled in the beginning when we’re effectively just running a lot of imports:

Then we wrap the ComfyUI server initialization steps with @enter(snap=True) to enable snapshotting:

Deploy your ComfyUI app and the first few runs will create a memory snapshot. These first few cold starts will be longer, but subsequent cold starts restore from the snapshot. In our experiment, 90% of containers started in less than 3 seconds. This is an order of magnitude improvement over the baseline of 10 to sometimes 20 second cold starts.

Conclusion

For customers running ComfyUI at scale and spinning up containers frequently, memory snapshots are a game changer. Containers that previously took 10s of seconds to start up now start up in single-digit seconds.

The tradeoff of using this approach is that the code is relatively untested and may require some effort to get working, especially for complex workflows. Reach out on Slack or leave an issue on modal-comfy-worker if you have any questions.

Ship your first app in minutes.

Get Started

$30 / month free compute