GPU Glossary
/device-software/memory-hierarchy

What is the Memory Hierarchy?

Shared memory and global memory are two levels of the memory hierarchy in the CUDA programming model (left), mapping onto the L1 data cache and GPU RAM , respectively. Modified from diagrams in NVIDIA's CUDA Refresher: The CUDA Programming Model and the NVIDIA CUDA C++ Programming Guide .

As part of the CUDA programming model , each level of the thread group hierarchy has access to a distinct block of memory shared by all threads in a group at that level: a "memory hierarchy" to match the thread group hierarchy. This memory can be used for coordination and communication and is managed by the programmer (not the hardware or a runtime).

For a thread block grid , that shared memory is in the GPU's RAM and is known as the global memory . Access to this memory can be coordinated with atomic operations and barriers, but execution order across thread blocks is indeterminate.

For a single thread , the memory is a chunk of the Streaming Multiprocessor's (SM's) register file . In keeping with the memory semantics of the CUDA programming model , this memory is private.

In between, the shared memory for the thread block level of the thread hierarchy is stored in the L1 data cache of each SM . Careful management of this cache — e.g. loading data into it to support the maximum number of arithmetic operations before new data is loaded — is key to the art of designing high-performance CUDA kernels .

Something seem wrong?
Or want to contribute?
Email: glossary@modal.com