Document OCR job queue

This tutorial shows you how to use Modal as an infinitely scalable job queue that can service async tasks from a web app. For the purpose of this tutorial, we’ve also built a React + FastAPI web app on Modal that works together with it, but note that you don’t need a web app running on Modal to use this pattern. You can submit async tasks to Modal from any Python application (for example, a regular Django app running on Kubernetes).

Our job queue will handle a single task: running OCR transcription for images. We’ll make use of a pre-trained Document Understanding model using the donut package to accomplish this. Try it out for yourself here.

receipt parser frontend

Define an App

Let’s first import modal and define a App. Later, we’ll use the name provided for our App to find it from our web app, and submit tasks to it.

import urllib.request

import modal

app = modal.App("example-doc-ocr-jobs")

Model cache

donut downloads the weights for pre-trained models to a local directory, if those weights don’t already exist. To decrease start-up time, we want this download to happen just once, even across separate function invocations. To accomplish this, we use the Image.run_function method, which allows us to run some code at image build time to save the model weights into the image.

CACHE_PATH = "/root/model_cache"
MODEL_NAME = "naver-clova-ix/donut-base-finetuned-cord-v2"

def download_model_weights() -> None:
    from huggingface_hub import snapshot_download

    snapshot_download(repo_id=MODEL_NAME, cache_dir=CACHE_PATH)

image = (

Handler function

Now let’s define our handler function. Using the @app.function() decorator, we set up a Modal Function that uses GPUs, runs on a custom container image, and automatically retries failures up to 3 times.

def parse_receipt(image: bytes):
    import io

    import torch
    from donut import DonutModel
    from PIL import Image

    # Use donut fine-tuned on an OCR dataset.
    task_prompt = "<s_cord-v2>"
    pretrained_model = DonutModel.from_pretrained(

    # Initialize model.
    device = torch.device("cuda")

    # Run inference.
    input_img =
    output = pretrained_model.inference(image=input_img, prompt=task_prompt)[
    print("Result: ", output)

    return output


Now that we have a function, we can publish it by deploying the app:

modal deploy

Once it’s published, we can look up this function from another Python process and submit tasks to it:

fn = modal.Function.lookup("example-doc-ocr-jobs", "parse_receipt")

Modal will auto-scale to handle all the tasks queued, and then scale back down to 0 when there’s no work left. To see how you could use this from a Python web app, take a look at the receipt parser frontend tutorial.

Run manually

We can also trigger parse_receipt manually for easier debugging: modal run doc_ocr_jobs::app.main To try it out, you can find some example receipts here.

def main():
    from pathlib import Path

    receipt_filename = Path(__file__).parent / "receipt.png"
    if receipt_filename.exists():
        with open(receipt_filename, "rb") as f:
            image =
        image = urllib.request.urlopen(