As part of the CUDA programming model , each level of the thread group hierarchy has access to matching memory from the memory hierarchy . This memory can be used for coordination and communication and is managed by the programmer (not the hardware or a runtime).

The highest level of that memory hierarchy is the global memory. Global memory is global in its scope and its lifetime. That is, it is accessible by every thread in a thread block grid and its lifetime is as long as the execution of the program.

Access to data structures in the global memory can be synchronized across all accessors using atomic instructions, as with CPU memory. Within a cooperative thread array , access can be more tightly synchronized, e.g. with barriers.

This level of the memory hierarchy is typically implemented in the GPU's RAM and allocated from the host using a memory allocator provided by the CUDA Driver API or the CUDA Runtime API .

Shared Memory

Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.

Host Software ?