Modal makes it simple and fast to run embarrassingly parallel jobs across hundreds of containers. In this tutorial, we’ll show you how to scale out a basic grid search in one dimension using Modal’s Function.map()
.
Setting up a traditional grid search
Let’s say we have a Modal Function fit_knn
that uses one of sklearn
’s built-in datasets and trains a k-nearest neighbors classifier on it. Currently, we run fit_knn
with each value of k
(number of nearest neighbors) sequentially.
We want to make this more efficient by calling fit_knn
with each value of k
in parallel.
Your turn: Parallelize this function
.map
is a Function
method that runs function calls in parallel, and returns the results as an iterable with the results ordered in the same way as the inputs.
Use .map
to run a distributed hyperparameter search with fit_knn
for different values of k
:
@app.local_entrypoint()
def main():
results = []
for k in range(1, 100):
results.append(fit_knn.remote(k))
results = fit_knn.map(range(1, 100))
best_score, best_k = max(results)
print("Best k = %3d, score = %.4f" % (best_k, best_score))
Once your program started running, you should see that 99 containers running fit_knn
are launched roughly in parallel. And all of this happens lightning fast!
Hyperparameter tuning is just one example of a workload you can parallelize effortlessly with Modal. Imagine how easy it is to scale any application from web scraping to batch processing, and even combining with GPU acceleration to run efficient distributed training experiments and evaluation sweeps!