What is Global Memory?
As part of the CUDA programming model , each level of the thread hierarchy has access to matching memory from the memory hierarchy . This memory can be used for coordination and communication and is managed by the programmer (not the hardware or a runtime).
The highest level of that memory hierarchy is the global memory. Global memory is global in its scope and its lifetime. That is, it is accessible by every thread in a thread block grid and its lifetime is as long as the execution of the program.
Access to data structures in the global memory can be synchronized across all accessors using atomic instructions, as with CPU memory. Within a cooperative thread array , access can be more tightly synchronized, e.g. with barriers.
This level of the memory hierarchy is typically implemented in the GPU's RAM and allocated from the host using a memory allocator provided by the CUDA Driver API or the CUDA Runtime API .
The terminology "global" unfortunately collides with the __global__
keyword in
CUDA C/C++ , which annotates functions that
are launched on the host but run on the device
(kernels ), whereas global memory is only
on the device. Early CUDA architect Nicholas Wilt wrily notes that this choice
was made "for maximum developer confusion" in his
CUDA Handbook .
Or want to contribute?
Click this button to
let us know on GitHub.