Volumes

The modal.Volume is a mutable volume built for high-performance file serving. Like the modal.NetworkFileSystem, these volumes can be simultaneously attached to multiple Modal functions, supporting concurrent reading and writing. But unlike the modal.NetworkFileSystem, the modal.Volume has been designed for fast reads and does not automatically synchronize writes between mounted volumes.

The modal.Volume works best with write-once, read-many I/O workloads.

Volumes work best when they contain less then 50,000 files and directories. The latency to attach or modify a volume scales linearly with the number of files in the volume, and past a few tens of thousands of files the linear component starts to dominate the fixed overhead.

Creating a volume

The easiest way to create a Volume and use it as a part of your app is to use the Volume.persisted constructor. This will create a Volume with a certain name, unless it already exists. You can then attach it to a function and read/write to/from it:

from modal import Stub, Volume

stub = Stub()

vol = Volume.persisted("my-volume")

@stub.function(volumes={"/data": vol})
def run():
    with open("/data/xyz.txt", "w") as f:
        f.write("hello")
    vol.commit()  # Needed to make sure all changes are persisted

This data will be persisted and accessible to any other function that mounts the volume.

Volume commits and reloads

Unlike a networked filesystem, you need to explicitly reload the Volume to see changes made since it was first mounted. This reload is handled by invoking the .reload() method on a Volume object. Similarly, any volume changes made within a container need to be committed for those the changes to become visible outside the current container. This is handled by invoking the .commit() method on a Volume object, or by enabling background commits.

At container creation time the latest state of an attached Volume is mounted. If the Volume is then subsequently modified by a commit operation in another running container, that Volume modification won’t become available until the original container does a .reload().

Consider this example which demonstrates the effect of a reload:

import pathlib
import modal

stub = modal.Stub()

volume = modal.Volume.persisted("my-volume")

p = pathlib.Path("/root/foo/bar.txt")


@stub.function(volumes={"/root/foo": volume})
def f():
    p.write_text("hello")
    print(f"Created {p=}")
    volume.commit()  # Persist changes
    print(f"Committed {p=}")


@stub.function(volumes={"/root/foo": volume})
def g(reload: bool = False):
    if reload:
        volume.reload()  # Fetch latest changes
    if p.exists():
        print(f"{p=} contains '{p.read_text()}'")
    else:
        print(f"{p=} does not exist!")


@stub.local_entrypoint()
def main():
    g.remote()  # 1. container for `g` starts
    f.remote()  # 2. container for `f` starts, commits file
    g.remote(reload=False)  # 3. reuses container for `g`, no reload
    g.remote(reload=True)   # 4. reuses container, but reloads to see file.

The output for this example is this:

p=PosixPath('/root/foo/bar.txt') does not exist!
Created p=PosixPath('/root/foo/bar.txt')
Committed p=PosixPath('/root/foo/bar.txt')
p=PosixPath('/root/foo/bar.txt') does not exist!
p=PosixPath('/root/foo/bar.txt') contains hello

This code runs two containers, one for f and one for g. Only the last function invocation reads the file created and committed by f because it was configured to reload.

Background commits

Volumes have support for background committing that is in beta. This functionality periodically commits the state of your Volume so that your application code does not need to invoke .commit().

This functionality is enabled using the _allow_background_volume_commits flag on @stub.function.

@stub.function(volumes={"/vol/models": volume}, _allow_background_volume_commits=True)
def train():
    p = pathlib.Path("/vol/models/dummy.txt")
    p.write_text("I will be persisted without volume.commit()!")
    ...

During the execution of the train function shown above, every few seconds the attached Volume will be snapshotted and its new changes committed. A final snapshot and commit is also automatically performed on container shutdown.

Being able to persist changes to Volumes without changing your application code is especially useful when training or fine-tuning models.

Model serving

A single ML model can be served by simply baking it into a modal.Image at build time using run_function. But if you have dozens of models to serve, or otherwise need to decouple image builds from model storage and serving, use a modal.Volume.

Volumes can be used to save a large number of ML models and later serve any one of them at runtime with much better performance than can be achieved with a modal.NetworkFileSystem.

This snippet below shows the basic structure of the solution.

import modal

stub = modal.Stub()
volume = modal.Volume.persisted("model-store")
model_store_path = "/vol/models"


@stub.function(volumes={model_store_path: volume}, gpu="any")
def run_training():
    model = train(...)
    save(model_store_path, model)
    volume.commit()  # Persist changes


@stub.function(volumes={model_store_path: volume})
def inference(model_id: str, request):
    try:
        model = load_model(model_store_path, model_id)
    except NotFound:
        volume.reload()  # Fetch latest changes
        model = load_model(model_store_path, model_id)
    return model.run(request)

Model checkpointing

Checkpoints are snapshots of an ML model and can be configured by the callback functions of ML frameworks. You can use saved checkpoints to restart a training job from the last saved checkpoint. This is particularly helpful in managing preemption.

Huggingface transformers

To periodically checkpoint into a modal.Volume, you must:

import pathlib

VOL_MOUNT_PATH = pathlib.Path("/vol")


@stub.function(
    gpu="A10g",
    timeout=7_200,
    volumes={VOL_MOUNT_PATH: volume},
)
def finetune():
    from transformers import Seq2SeqTrainer
    ...

    training_args = Seq2SeqTrainingArguments(
        output_dir=str(VOL_MOUNT_PATH / "model"),
        ...
    )

    trainer = Seq2SeqTrainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_xsum_train,
        eval_dataset=tokenized_xsum_test,
    )

Filesystem consistency

Concurrent modification

Concurrent modification from multiple containers is supported, but concurrent modifications of the same files should be avoided. Last write wins in case of concurrent modification of the same file — any data the last writer didn’t have when committing changes will be lost!

The number of commits you can run concurrently is limited. If you run too many concurrent commits each commit will take longer due to contention. If you are committing small changes, avoid doing more than 5 concurrent commits (the number of concurrent commits you can make is proportional to the size of the changes being committed).

As a result, volumes are typically not a good fit for use cases where you need to make concurrent modifications to the same file (nor is distributed file locking supported).

While a commit or reload is in progress the volume will appear empty to the container that initiated the commit. That means you cannot read from or write to a volume in a container where a commit or reload is ongoing (note that this only applies to the container where the commit or reload was issued, other containers remain unaffected).

For example, this is not going to work:

volume = modal.Volume.persisted("my-volume")


@stub.function(image=modal.Image.debian_slim().pip_install("aiofiles"), volumes={"/vol": volume})
async def concurrent_write_and_commit():
    async with aiofiles.open("/vol/big.file", "w") as f:
        await f.write("hello" * 1024 * 1024 * 500)

    async def f():
        await asyncio.sleep(0.1)  # Wait for the commit to start
        # This is going to fail with:
        # PermissionError: [Errno 1] Operation not permitted: '/vol/other.file'
        # since the commit is in progress when we attempt the write.
        async with aiofiles.open("/vol/other.file", "w") as f:
            await f.write("hello")

    await asyncio.gather(volume.commit.aio(), f())

Busy volume errors

On the legacy modal.Volume backend (_allow_background_volume_commits=False) commits cannot be performed while volume files are still open for writing. The commit operation will fail with “volume busy”. The following is a simple example of how a “volume busy” error can occur:

volume = modal.Volume.persisted("my-volume")


@stub.function(volumes={"/vol": volume}, _allow_background_volume_commits=False)
def seed_volume():
    f = open("/vol/data.txt", "w")
    f.write("hello world") # file not closed after writing
    volume.commit()
    f.close()  # closed file too late

‘Value too large for defined data type’

On the legacy modal.Volume backend (_allow_background_volume_commits=False) a problem manifests when you try to modify/overwrite large files that have previously been committed to the volume. Upon writing to a previously committed file, you will get an exception like:

OSError: [Errno 75] Value too large for defined data type: '/vol/my/path/foo.pt'

You can workaround the issue by enabling background commits on your volume. Alternatively, if you’re overwriting the old file rather than modifying it, you can work around the issue by first removing the file in question. This alternative does not require enabling background commits, but does not work for file modifications.

Persisting volumes

By default, a modal.Volume lives as long as the app it’s defined in, just like any other Modal object. However in many situations you might want to persist file data between runs of the app. To do this, you can use the persisted method on the Volume object.

Further examples