Or want to contribute?
Click this button to
let us know on GitHub.
What is a Thread Block Grid?
When a CUDA kernel is launched, it creates a collection of threads known as a thread block grid. Grids can be one, two, or three dimensional. They are made up of thread blocks .
The matching level of the memory hierarchy is the global memory .
Thread blocks are effectively independent units of computation. They execute concurrently, that is, with indeterminate order, ranging from fully sequentially in the case of a GPU with a single Streaming Multiprocessor to fully in parallel when run on a GPU with sufficient resources to run them all simultaneously.
Building on GPUs? We know a thing or two about it.
Modal is an ergonomic Python SDK wrapped around a global GPU fleet. Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.