Check out our new GPU Glossary! Read now
May 20, 20243 minute read
Why Substack moved their AI and ML pipelines to Modal

Substack logo

Substack is a popular online platform for writers to publish newsletters, with over 17k writers and $300M in paid subscription volume.

Substack employs ML for various purposes, including spam detection, newsletter recommendations, audio transcription, sentiment analysis, and image generation. For nearly all these models Substack has moved both training and deployment from AWS SageMaker to Modal.

The challenges deploying AI and ML before Modal

Previously, Substack’s training and deployment pipelines were built on AWS SageMaker and orchestrated with Airflow. Adding or updating models was a slow and painful process for a few reasons.

One, the developer experience on SageMaker was convoluted. Engineers had to navigate to the SageMaker product, create a notebook, specify machine requirements, and wait for that machine to turn on, all before a single line of code could be written. Not to mention the difficulty of juggling multiple remote environments—from the Jupyter notebooks to the SageMaker training machines to the final production infra.

Two, collaborating was difficult. Mike Cohen, Head of AI and MLE at Substack, found that his team often had to duplicate code across various notebooks due to the way SageMaker would package code before sending it off to training machines. It was hard to share components across similar projects.

Three, containers took forever to start up. Engineers had to wait 5+ minutes for training machines to spin up, impeding their ability to iterate quickly and incurring unnecessary costs. They tried AWS’s Serverless Inference product thinking it’d be faster, but it lacked GPU support and still proved to be too slow.

Using SageMaker just felt like a very convoluted process. It felt very removed from a normal engineering workflow. With Modal, it’s a lot faster. You don’t feel like you have to wait for notebooks to spin up or remember to turn them off or any of that type of stuff.
— Mike Cohen, Head of AI & ML Engineering

Making everything faster on Modal

Mike first tried Modal to run a transcription model and quickly realized how much more natural it felt to iterate on Modal. All development and testing was done in the most obvious places—his code editor and the command line. Getting an inference endpoint up and running took just an hour to implement, and Modal’s container autoscaling helped them parallelize transcription workloads out of the box.

Substack workflow

Substack decided to migrate over training and deployment for existing models as well. They implemented a full fine-tune → validation → deployment workflow by leveraging Modal’s native storage primitives. In this workflow, they:

  1. Fetched training data from Snowflake, wrote it to S3, then mounted the S3 bucket to Modal.
  2. Ran the model fitting with Modal functions.
  3. Saved the weights to a Modal network volume.
  4. Ran model validation with another Modal function using those weights.
  5. Deployed the new model on Modal and updated a Modal key-value store to track which model version was in production.
We want to invest more in our recommendations system and models in general. My teammate was able to spin up a new recommendation model with Modal very quickly in a matter of days, whereas that probably would have taken quite a bit of time in SageMaker land.
— Mike Cohen

By moving to Modal, Substack has been able to develop and deploy ML workflows with greater speed and flexibility than ever before.

Ship your first app in minutes.

Get Started

$30 / month free compute