Seamless Computational Bio at Chai Discovery
Chai Discovery is a frontier drug discovery company using machine learning to design new medicines. Their mission is to develop a flexible, ML-driven platform that can adapt to new biological targets and experimental data, accelerating discovery across diseases and modalities—the “computer-aided design suite for molecules”.
Too often, infrastructure is the bottleneck for research and discovery. By building on Modal, Chai can scale experiments seamlessly, keep data consistent, and run the same workflows from research through production.
The challenge: complex, bursty bio workloads
Chai’s machine learning pipelines combine diverse models, large biological datasets, and GPU-heavy computations. Each experiment can differ dramatically in scale, from small protein structure tests to full antibody design campaigns, and must scale from one run to thousands overnight. The workloads are heterogeneous and bursty, with frequent precomputation steps each with shifting hardware demands.
Running all this on traditional cloud infrastructure would have meant maintenance overhead that would have slowed their research:
- Repetitive data setup: Huge datasets, often hundreds of gigabytes, would need to be downloaded and indexed repeatedly on every machine.
- Hardware drift: Inconsistent GPU types and driver versions could introduce subtle reproducibility bugs.
- Operational overhead and idle time: Scaling inference would mean manual orchestration, days of setup, and paying for idle clusters.
Fast, scalable, consistent compute with Modal
Chai adopted Modal from day one to eliminate the infrastructure overhead that could have slowed experimentation. With Modal, compute is elastic, consistent, and instantly accessible, so researchers can focus on science, not infrastructure.
Consistent and reproducible execution environments for heterogeneous models
Before Modal, reproducibility was fragile. Small mismatches in GPU or driver versions could derail results and force hours of debugging. On Modal, every job runs in an identical, reproducible environment. That consistency is essential for Chai’s pipelines, which chain together many heterogeneous models and needs, like protein embeddings, multiple sequence alignments (MSA), and antibody design models. Furthermore, the ease of managing Modal environments enables Chai to deploy outputs exactly as they are developed in research, improving efficiency and scientific rigor.
Shared, high-throughput data access
Multiple sequence alignment (MSA) workloads are foundational to Chai’s discovery stack, requiring datasets that can reach hundreds of gigabytes. To set up from scratch in a traditional computing environment, each machine would have to download and index that data from scratch, a process that could take hours. With Modal Volumes, Modal’s high-performance distributed file system, the database is downloaded and indexed once, then instantly shared across every machine offering near-instant cold-start attachment, consistent performance across many GPUs. The persistent state eliminates the need for repeated downloads or indexing, so researchers can start new runs immediately without repeating setup or worrying about storage overhead.
Dynamic GPU scaling and workload elasticity
Chai’s workloads are highly variable—relatively quiet one day, bursting to thousands of inference jobs the next. Modal’s elastic scaling matches that pattern automatically. GPUs spin up in minutes, handle the peak load, and spin down again as demand drops. The team never has to manage clusters, plan capacity, or worry about underutilization.
Together, Modal’s consistency, shared data, and dynamic scaling enable fast feedback loops. Chai’s scientists iterate quickly, moving from exploration to production without rewriting code or waiting for infrastructure. The same Modal setup powers every stage of their research — from generating embeddings to running full antibody design pipelines.

From research to production on one platform
Today, Modal is a key component in Chai’s compute platform, powering everything from large-scale model training experiments to molecular design inference pipelines.
With Modal, Chai can move research ideas into production with almost no friction. Retries, scaling, and hardware orchestration happen automatically, giving researchers the same reliability whether they’re quickly prototyping a new model or deploying a battle-tested server for a production pipeline.
Chai can now spin up hundreds of GPUs in minutes, processes terabyte-scale biological datasets instantly, and ships new production pipelines directly from Python, without needing to rewrite infrastructure. What once took days of setup now happens automatically, giving researchers faster feedback and freeing them to focus on discovery.