Document OCR job queue

This tutorial shows you how to use Modal as an infinitely scalable job queue that can service async tasks from a web app. For the purpose of this tutorial, we’ve also built a React + FastAPI web app on Modal that works together with it, but note that you don’t need a web app running on Modal to use this pattern. You can submit async tasks to Modal from any Python application (for example, a regular Django app running on Kubernetes).

Our job queue will handle a single task: running OCR transcription for images. We’ll make use of a pre-trained Document Understanding model using the donut package to accomplish this. Try it out for yourself here.

receipt parser frontend

Define a Stub

Let’s first import modal and define a Stub. Later, we’ll use the name provided for our Stub to find it from our web app, and submit tasks to it.

import urllib.request

import modal

stub = modal.Stub("example-doc-ocr-jobs")

Model cache

donut downloads the weights for pre-trained models to a local directory, if those weights don’t already exist. To decrease start-up time, we want this download to happen just once, even across separate function invocations. To accomplish this, we use a SharedVolume, a writable volume that can be attached to Modal functions and persisted across function runs.

volume = modal.SharedVolume().persist("doc_ocr_model_vol")
CACHE_PATH = "/root/model_cache"

Handler function

Now let’s define our handler function. Using the @stub.function decorator, we set up a Modal Function that uses GPUs, has a SharedVolume mount, runs on a custom container image, and automatically retries failures up to 3 times.

@stub.function(
    gpu="any",
    image=modal.Image.debian_slim().pip_install(
        "donut-python==1.0.7", "transformers==4.21.3"
    ),
    shared_volumes={CACHE_PATH: volume},
    retries=3,
)
def parse_receipt(image: bytes):
    import io

    import torch
    from donut import DonutModel
    from PIL import Image

    # Use donut fine-tuned on an OCR dataset.
    task_prompt = "<s_cord-v2>"
    pretrained_model = DonutModel.from_pretrained(
        "naver-clova-ix/donut-base-finetuned-cord-v2", cache_dir=CACHE_PATH
    )

    # Initialize model.
    pretrained_model.half()
    device = torch.device("cuda")
    pretrained_model.to(device)

    # Run inference.
    input_img = Image.open(io.BytesIO(image))
    output = pretrained_model.inference(image=input_img, prompt=task_prompt)[
        "predictions"
    ][0]
    print("Result: ", output)

    return output

Deploy

Now that we have a function, we can publish it by deploying the app:

modal deploy doc_ocr_jobs.py

Once it’s published, we can look up this function from another Python process and submit tasks to it:

fn = modal.lookup("example-doc-ocr-jobs", "parse_receipt")
fn.spawn(my_image)

Modal will auto-scale to handle all the tasks queued, and then scale back down to 0 when there’s no work left. To see how you could use this from a Python web app, take a look at the receipt parser frontend tutorial.

Run manually

We can also trigger parse_receipt manually for easier debugging: modal run doc_ocr_jobs::stub.main To try it out, you can find some example receipts here.

@stub.local_entrypoint
def main():
    from pathlib import Path

    receipt_filename = Path(__file__).parent / "receipt.png"
    if receipt_filename.exists():
        with open(receipt_filename, "rb") as f:
            image = f.read()
    else:
        image = urllib.request.urlopen(
            "https://nwlc.org/wp-content/uploads/2022/01/Brandys-walmart-receipt-8.webp"
        ).read()
    print(parse_receipt.call(image))

Try this on Modal!

You can run this on Modal with 60 seconds of work!
Creating an account is free and no credit card is required. After creating an account, install the Modal Python package and create an API token:
pip install modal-client
modal token new
git clone https://github.com/modal-labs/modal-examples
cd modal-examples
modal run 09_job_queues/doc_ocr_jobs.py