A thread block is a level of the CUDA programming model's thread hierarchy below a grid but above a warp . It is the CUDA programming model's abstract equivalent of the concrete cooperative thread arrays in PTX /SASS .

Blocks are the smallest unit of thread coordination exposed to programmers. Blocks must execute independently, so that any execution order for blocks is valid, from fully serial in any order to all interleavings.

A single CUDA kernel launch produces one or more thread blocks (in the form of a block grid ), each of which contains one or more warps . Blocks can be arbitrarily sized, but they are typically multiples of the warp size (32 on all current CUDA GPUs).

Kernel

Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.

Thread Block Grid ?