Screenshot with Chromium

In this example, we use Modal functions and the playwright package to take screenshots of websites from a list of URLs in parallel.

You can run this example on the command line with

modal run 02_building_containers/screenshot.py --url 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'

This should take a few seconds then create a /tmp/screenshots/screenshot.png file, shown below.

screenshot

Setup

First we import the Modal client library.

import pathlib

import modal

app = modal.App(
    "example-screenshot"
)  # Note: prior to April 2024, "app" was called "stub"

Define a custom image

We need an image with the playwright Python package as well as its chromium plugin pre-installed. This requires intalling a few Debian packages, as well as setting up a new Debian repository. Modal lets you run arbitrary commands, just like in Docker:

image = modal.Image.debian_slim().run_commands(
    "apt-get update",
    "apt-get install -y software-properties-common",
    "apt-add-repository non-free",
    "apt-add-repository contrib",
    "pip install playwright==1.42.0",
    "playwright install-deps chromium",
    "playwright install chromium",
)

The screenshot function

Next, the scraping function which runs headless Chromium, goes to a website, and takes a screenshot. This is a Modal function which runs inside the remote container.

@app.function(image=image)
async def screenshot(url):
    from playwright.async_api import async_playwright

    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto(url, wait_until="networkidle")
        await page.screenshot(path="screenshot.png")
        await browser.close()
        data = open("screenshot.png", "rb").read()
        print("Screenshot of size %d bytes" % len(data))
        return data

Entrypoint code

Let’s kick it off by reading a bunch of URLs from a txt file and scrape some of those.

@app.local_entrypoint()
def main(url: str = "https://modal.com"):
    filename = pathlib.Path("/tmp/screenshots/screenshot.png")
    data = screenshot.remote(url)
    filename.parent.mkdir(exist_ok=True)
    with open(filename, "wb") as f:
        f.write(data)
    print(f"wrote {len(data)} bytes to {filename}")

And we’re done! Please also see our introductory guide for another example of a web scraper, with more in-depth logic.