August 16, 20245 minute read
How Modal speeds up container launches in the cloud
author
Yiren Lu@YirenLu
Solutions Engineer

At Modal, one of our goals is to make running code in the cloud as intuitive and easy as running code locally.

To do this, we’ve had to architect a system that spins up cloud-based containers (with your code in them) as fast as possible, ideally under 1 second.

In this article, we will cover some of the techniques around how we did this.

Understanding containers

Let’s start with what a container is.

Think of a container as a lightweight, stand-alone, executable package of software that includes everything needed to run it, isolated from the rest of the system. This includes the code, runtime, system tools, libraries, and settings.

At the heart of a container is a Linux root file system that replicates a traditional Linux environment with directories like usr, etc, and lib. Containers also include various features for resource management and security.

Containers ensure resources are isolated, allowing applications to run effectively without affecting one another.

Shortening image pulls

While containers solve many problems related to software consistency and isolation, running them efficiently in the cloud presents new challenges.

One of the most significant is the time it takes to pull a container image from a remote repository.

Container images can be large, often weighing in at several hundred megabytes or even gigabytes. For example, a standard Docker image might be around 1 GB, and more complex applications that rely on frameworks like CUDA or TensorFlow can easily exceed 10 GB. Pulling such large images from a remote repository can take several minutes.

This slow download time can be a significant bottleneck. The infamous Docker progress bar, slowly filling as the image downloads, is a common sight for anyone who has worked with containers in the cloud.

Reducing image bloat

One of the primary reasons for the slow download times is that many container images are unnecessarily bloated. A typical container might contain thousands of files, many of which are not even used by the application.

Some developers attempt to address this issue by optimizing their Dockerfiles, which is a good start, but we wanted to go even deeper.

We noticed that when running a Python application, the system might make thousands of file system calls, but only access a small fraction of the files in the container image.

For example, executing the command:

$ python3 -c 'import sklearn'

results in:

  • 3,043 calls to stat
  • 1,073 calls to openat

but only actually accesses 1,000 unique files. This means that a vast majority of the files in the container image are not even being used.

By identifying and focusing on the essential files needed to run an application, it’s possible to significantly reduce the size of the container image and, consequently, the time it takes to pull and start the container.

Avoiding Docker

While Docker itself is a powerful tool for managing containers, it comes with some overhead that can slow down the process of launching containers, especially in the cloud. To streamline the process, it’s possible to bypass Docker entirely and use a lightweight container runtime like runc or gVisor.

These runtimes won’t manage images or containers; instead, they simply point to a root filesystem and takes a JSON configuration to execute a container.

This opens up a way for us to start the container without having to pull images at all.

After constructing the image, Modal transfers it to network storage. Following this, when it comes time to run the container, Modal deploys gVisor, providing it with a root filesystem that’s stored on the network.

Rather than wasting time waiting for an image to download, the container can launch almost immediately since necessary files are already on the network share.

(Note: a further reason we use gVisor and not regular containers is that it is more secure. Regular containers share the host system’s kernel. This means that if a vulnerability is discovered in the kernel, it could potentially affect all containers running on that host. Conversely, a malicious container might be able to exploit kernel vulnerabilities to break out of its container and access the host system or other containers. gVisor works by intercepting application system calls and acts as a guest kernel, limiting the surface area for potential attacks.)

Caching frequently accessed files locally

Even with optimizations like avoiding Docker and reducing image size, there’s still the issue of file system latency. For instance, executing heavy imports leads to a significant number of file operations. Even with NFS latency around 2 milliseconds, if you have around 4,000 file accesses, this can lead to an wait time of approximately 8 seconds.

The solution we landed on is to cache frequently accessed files locally. By storing these files on a local SSD or in memory, it’s possible to reduce access times dramatically. Local SSDs have latencies in the range of 100 microseconds. Caching files in the Linux page cache can bring latencies down even further.

Caching is especially effective when running the same container image multiple times, as many of the files accessed will be the same. Even when running different images, there’s often a significant overlap in the files they access.

Content-addressed caching

To effectively cache files, we can deploy a technique called content-addressing. This method involves hashing the contents of each file, then leveraging that hash to define the storage location for each file.

When gVisor tries to access a file, it first checks the cache to see if the file is already available locally. If it is, the file is returned from the cache, bypassing the need to access the network or disk.

If the file isn’t in the cache, it can be fetched from the network and stored locally for future use. This approach ensures that the most frequently accessed files are always available quickly, significantly improving the performance of container workloads.

To do this, we had to set up a simple filesystem in FUSE (Filesystem in Userspace). Contrary to popular belief, building filesystems isn’t prohibitively complex. You can even do this in Python!

Conclusion

By focusing on what’s essential for running an application and avoiding unnecessary overhead, we’ve developed a system that significantly reduces the time it takes to start containers, making it easier for developers to deploy and scale their applications in the cloud. To see this for yourself, you can get started with Modal here.

This article is adapted from Erik Bernhardsson’s 2023 talk at Data Council.

Ship your first app in minutes.

Get Started

$30 / month free compute