Or want to contribute?
Click this button to
let us know on GitHub.
What is a CUDA Thread Block?
A thread block is a level of the CUDA programming model's thread hierarchy below a grid but above a thread . It is the CUDA programming model's abstract equivalent of the concrete cooperative thread arrays in PTX /SASS .
Blocks are the smallest unit of thread coordination exposed to programmers in the CUDA programming model . Blocks must execute independently, so that any execution order for blocks is valid, from fully serial in any order to all interleavings.
A single CUDA kernel launch produces one or more thread blocks (in the form of a thread block grid ), each of which contains one or more warps . Blocks can be arbitrarily sized, up to a limit of 1024 on current devices, but they are typically multiples of the warp size (32 on current devices).
Building on GPUs? We know a thing or two about it.
Modal is an ergonomic Python SDK wrapped around a global GPU fleet.Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.
Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.