/device-software/thread-block-grid
Thread Block Grid
When a CUDA kernel is launched, it creates a collection of threads known as a thread block grid. Grids can be one, two, or three dimensional. They are made up of thread blocks .
The matching level of the memory hierarchy is the global memory .
Thread blocks are effectively independent units of computation. They execute concurrently, that is, with indeterminate order, ranging from fully sequentially in the case of a GPU with a single Streaming Multiprocessor to fully in parallel when run on a GPU with sufficient resources to run them all simultaneously.