The bottom-level memory of the GPU is a large (many megabytes to gigabytes) memory store that is addressable by all of the GPU's Streaming Multiprocessors (SMs) .

It is also known as GPU RAM (random access memory) or video RAM (VRAM). It uses Dynamic RAM (DRAM) cells, which are slower but smaller than the Static RAM (SRAM) used in registers and cache memory . For details on DRAM and SRAM, we recommend Ulrich Drepper's 2007 article "What Every Programmer Should Know About Memory" .

It is generally not on the same die as the SMs , though in the latest data center-grade GPUs like the H100, it is located on a shared interposer for decreased latency and increased bandwidth . These GPUs use High-Bandwidth Memory (HBM) technology, rather than the more familiar Double Data Rate (DDR) memory in consumer GPUs and CPUs.

RAM is used to implement the global memory of the CUDA programming model and to store register data that spills from the register file .

An H100 can store 80 GiB (687,194,767,360 bits) in its RAM.

Tensor Memory

Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.

Device Software ?