

Thanks to community member Tolga Oguz for getting memory snapshots to work with ComfyUI. Check out the repo: modal-comfy-worker.
In this post, we show how Modal’s memory snapshotting feature can drive ComfyUI cold starts down from 10-15 seconds to less than 3 seconds on average.
ComfyUI cold start (seconds)
The problem: high ComfyUI cold starts
Cold start is the time it takes for a container to start up.
In the context of ComfyUI on Modal, cold start is the time it takes for the ComfyUI server to launch (i.e. running ComfyUI main.py).
Let’s break down the cold start for our ComfyUI example.
Note that loading a checkpoint (e.g. flux, stable diffusion) does not happen at cold start, but rather on the first workflow execution.
Custom node prestartup scripts
Feb 13 13:33:10.476
Launching ComfyUI from: /root/comfy/ComfyUI
Feb 13 13:33:11.355
[START] Security scan [DONE] Security scan ## ComfyUI-Manager: installing dependencies done. ** ComfyUI startup time: 2025-02-13 18:33:11.350 ** Platform: Linux ** Python version: 3.11.10 (main, Dec 3 2024, 02:25:00) [GCC 12.2.0] ** Python executable: /usr/local/bin/python ** ComfyUI Path: /root/comfy/ComfyUI ** User directory: /root/comfy/ComfyUI/user ** ComfyUI-Manager config path: /root/comfy/ComfyUI/user/default/ComfyUI-Manager/config.ini ** Log path: /root/comfy/ComfyUI/user/comfyui.log
Feb 13 13:33:11.975
Prestartup times for custom nodes:
Feb 13 13:33:11.980
1.4 seconds: /root/comfy/ComfyUI/custom_nodes/ComfyUI-Manager
First, we run some prestartup scripts for ComfyUI manager, which includes a security scan (to make sure none of our custom nodes are malicious), importing standard libraries, and ensuring we have necessary dependencies.
Import torch
Feb 13 13:33:14.292
Total VRAM 45589 MB, total RAM 928032 MB pytorch version: 2.5.1+cu124
Feb 13 13:33:14.297
Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA L40S : cudaMallocAsync
Feb 13 13:33:16.253
Using pytorch attention
Feb 13 13:33:18.170
[Prompt Server] web root: /root/comfy/ComfyUI/web
Importing torch is by far the heaviest part of starting up the ComfyUI server; it’s executing 26,000 syscalls after all.
Initialize custom nodes
Feb 13 13:33:19.068
WAS Node Suite: Created default conf file at `/root/comfy/ComfyUI/custom_nodes/pr-was-node-suite-comfyui-47064894/was_suite_config.json`.
Feb 13 13:33:20.119
WAS Node Suite: OpenCV Python FFMPEG support is enabled WAS Node Suite Warning: `ffmpeg_bin_path` is not set in `/root/comfy/ComfyUI/custom_nodes/pr-was-node-suite-comfyui-47064894/was_suite_config.json` config file. Will attempt to use system ffmpeg binaries if available.
Feb 13 13:33:20.418
WAS Node Suite: Finished. Loaded 218 nodes successfully. "Art is the voice of the soul, expressing what words cannot." - Unknown
Feb 13 13:33:20.435
### Loading: ComfyUI-Manager (V3.7.6)
Feb 13 13:33:20.498
### ComfyUI Revision: 2980 [ee9547ba] *DETACHED | Released on '2024-12-26'
Feb 13 13:33:20.512
Import times for custom nodes:
Feb 13 13:33:20.517
0.0 seconds: /root/comfy/ComfyUI/custom_nodes/websocket_image_save.py 0.1 seconds: /root/comfy/ComfyUI/custom_nodes/ComfyUI-Manager 1.7 seconds: /root/comfy/ComfyUI/custom_nodes/pr-was-node-suite-comfyui-47064894
Feb 13 13:33:20.520
Starting server
Lastly, we initialize our custom nodes. Specifically, this is the time it takes for init_external_custom_nodes() to run, which loops through each custom node directory and imports the relevant modules. For customers running workflows with many custom nodes, this step is often the biggest cold start driver.
Solution: memory snapshotting
Modal offers an advanced feature called memory snapshots that allows you to snapshot your container state right after starting up. Then when subsequent containers start up they can restore from this snapshot and improve cold start time by 1.5-3x.
One complication of using memory snapshots is that you can only snapshot CPU memory. If you try to snapshot the ComfyUI server start step with a GPU attached, memory snapshot will not work.
Luckily, community member Tolga Oguz came up with a great solution to override core ComfyUI and skip the CUDA device checks on server start. We don’t need the GPU enabled in the beginning when we’re effectively just running a lot of imports:
# https://github.com/tolgaouz/modal-comfy-worker/blob/main/comfy/experimental_server.py
class ExperimentalComfyServer:
"""Experimental ComfyUI server that runs workflows in the main thread.
Features:
- Executes workflows without starting separate server process
- Allows model preloading to CPU for faster cold starts
- Uses modified ComfyUI components for direct execution
- Maintains similar interface to regular ComfyServer
"""
MSG_TYPES_TO_PROCESS = [
"executing",
"execution_cached",
"execution_complete",
"execution_start",
"progress",
"status",
"completed",
]
from contextlib import contextmanager
@contextmanager
def force_cpu_during_snapshot(self):
import torch
"""Monkeypatch Torch CUDA checks during model loading/snapshotting"""
original_is_available = torch.cuda.is_available
original_current_device = torch.cuda.current_device
# Force Torch to report no CUDA devices
torch.cuda.is_available = lambda: False
torch.cuda.current_device = lambda: torch.device("cpu")
try:
yield
finally:
# Restore original implementations
torch.cuda.is_available = original_is_available
torch.cuda.current_device = original_current_device
def __init__(self, config=None, preload_models: List[str] = []):
"""Initialize experimental server.
Args:
config: Compatibility with regular server (not used)
preload_models: List of model paths to preload to CPU
"""
with self.force_cpu_during_snapshot():
logger.info("Initializing experimental server")
self.preload_models = preload_models
self.initialized = False
self.model_cache = {}
self.executor = None
# Set up ComfyUI environment overrides
self._override_comfy(preload_models)
Then we wrap the ComfyUI server initialization steps with @enter(snap=True)
to enable snapshotting:
class ComfyWorkflow:
@enter(snap=True)
def run_this_on_container_startup(self):
self.web_app = FastAPI()
self.server = ExperimentalComfyServer()
Deploy your ComfyUI app and the first few runs will create a memory snapshot. These first few cold starts will be longer, but subsequent cold starts restore from the snapshot. In our experiment, 90% of containers started in less than 3 seconds. This is an order of magnitude improvement over the baseline of 10 to sometimes 20 second cold starts.
Conclusion
For customers running ComfyUI at scale and spinning up containers frequently, memory snapshots are a game changer. Containers that previously took 10s of seconds to start up now start up in single-digit seconds.
The tradeoff of using this approach is that the code is relatively untested and may require some effort to get working, especially for complex workflows. Reach out on Slack or leave an issue on modal-comfy-worker if you have any questions.