“We've previously managed to break services like GitHub because of our load, so when Modal was able to handle the massive scale of our AI weekend event so smoothly, that meant a lot.”
“Modal's infrastructure gave us the performance and reliability we need to ship this in every global region, at production scale.”
“Modal makes it easy to write code that runs on 100s of GPUs in parallel, transcribing podcasts in a fraction of the time.”
“Modal lets us move fast while keeping full control over our models and serving stack. The flexibility meant we could train high-accuracy models and hit the real-time performance our product demands.”
“Tasks that would have taken days to complete take minutes instead. We’ve also saved thousands of dollars deploying open-source LLMs on Modal.”
“The beauty of Modal is that all you need to know is that you can scale your function calls in the cloud with a few lines of Python.”
“Running this compute on Modal simplified operations and enabled rapid experimentation with larger models, while only adding 10-15ms of network overhead.”
“Combined with Modal's ability to parallelize workloads, this lets us evaluate far more ideas in the same amount of time.”
“Modal made it really easy to get started. The first version only took me a few days to get off the ground, and it had all the right APIs for us to build something at the scale we needed.”
“Our ML team can just say, I’m going to run this — I don’t have to think about whether it needs 10,000 queries or what my data needs. It all just happens behind the scenes.”
“Modal has been great for iterating quickly on our data pipelines. It enables us to process a large batch of logs in minutes! The infrastructure is amazing for experimentation.”
“The ability to just modal deploy was really nice. Modal gives us a lot of flexibility to do pretty complex stuff that we wouldn’t get with an LLM inference service.”
“Decagon was able to achieve a p90 latency of 342ms, well below the sub-second range required for natural customer conversations — delivering speed, efficiency, and enterprise-scale reliability.”
“Modal is the only platform that supported our custom infrastructure needs and had a simple developer experience. We deploy hundreds of predictive models for our core model routing functionality, and Modal helps us scale our product cleanly.”
“Switched to Modal for our LLM inference instead of Azure. 1/4 the price for GPUs and so much simpler to set up/scale. Big fan.”
“We are constantly shipping the most cutting-edge creative AI machine learning techniques so our customers have access to the best creative models. Modal's has helped us streamline the process from idea to deployed pipeline, allowing us to both deploy quickly & scale rapidly.”
“Modal makes it unbelievably quick to deploy our models onto scalable infrastructure. We’ve been able to move faster on our last few model launches, including Olmo and Tülu, thanks to the platform.”
“Our platform leverages Modal's infrastructure for the heavy lifting—handling ridiculous scale and concurrency behind the scenes. This lets us focus on what we do best: gathering and analyzing unstructured textual data with precision.”
“Adopting Modal allows my team to focus on our product, not infrastructure. We save thousands in CPU/GPU costs, but more importantly engineering-hours. Everything we run on Modal just works: we scale from 0.5gb of ram to 95gb of ram without a second thought. Almost everything we do runs on Modal.”
“At Phonic, we train our own proprietary models for audio generation. We moved all our large-scale audio processing batch jobs to Modal. Our engineers are ecstatic with the result – we can run at a much larger scale than before, no longer have to babysit our batch jobs, and we can ship much faster.”
“Using Modal for inference is like having an extra infra team - it’s reliable, scalable, and fast - meaning I can get back to training models”
“We use Modal to securely run LLM-augmented code on a large scale. Modal’s powerful primitives like sandboxes and file systems have allowed us to focus on our core competencies without having to waste time on our own infra.”
