GPU Glossary
/device-software/memory-hierarchy

Memory Hierarchy

Shared memory and global memory are two levels of the memory hierarchy in the CUDA programming model (left), mapping onto the L1 data cache and GPU RAM , respectively. Modified from diagrams in NVIDIA's CUDA Refresher: The CUDA Programming Model and the NVIDIA CUDA C++ Programming Guide .

As part of the CUDA programming model , each level of the thread group hierarchy has access to a distinct block of memory shared by all threads in a group at that level: a "memory hierarchy" to match the thread group hierarchy. This memory can be used for coordination and communication and is managed by the programmer (not the hardware or a runtime).

For a thread block grid , that shared memory is in the GPU's RAM and is known as the global memory . Access to this memory can be coordinated with atomic operations and barriers, but execution order across thread blocks is indeterminate.

For a single thread , the memory is a chunk of the Streaming Multiprocessor's (SM's) register file . In keeping with the memory semantics of the CUDA programming model , this memory is private.

In between, the shared memory for the thread block level of the thread hierarchy is stored in the L1 data cache of each SM . Careful management of this cache — e.g. loading data into it to support the maximum number of arithmetic operations before new data is loaded — is key to the art of designing high-performance CUDA kernels .

Something seem wrong?
Or want to contribute?
Email: glossary@modal.com