Or want to contribute?
Click this button to
let us know on GitHub.
What is a CUDA Thread Block?
A thread block is a level of the CUDA programming model's thread hierarchy below a grid but above a thread . It is the CUDA programming model's abstract equivalent of the concrete cooperative thread arrays in PTX /SASS .
Blocks are the smallest unit of thread coordination exposed to programmers in the CUDA programming model . Blocks must execute independently, so that any execution order for blocks is valid, from fully serial in any order to all interleavings.
A single CUDA kernel launch produces one or more thread blocks (in the form of a thread block grid ), each of which contains one or more warps . Blocks can be arbitrarily sized, but they are typically multiples of the warp size (32 on all current CUDA GPUs).
Building on GPUs? We know a thing or two about it.
Modal is an ergonomic Python SDK wrapped around a global GPU fleet. Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.