GPU Glossary
/device-hardware/l1-data-cache

L1 Data Cache

The L1 data cache is the private memory of the Streaming Multiprocessor (SM).

The internal architecture of an H100 SM. The L1 data cache is depicted in light blue. Modified from NVIDIA's H100 white paper .

Each SM partitions that memory among groups of threads scheduled onto it.

The L1 data cache is co-located with and nearly as fast as components that effect computations (e.g. the CUDA Cores ).

It is implemented with SRAM, the same basic semiconductor cell used in CPU caches and registers and in the memory subsystem of Groq LPUs . The L1 data cache is accessed by the Load/Store Units of the SM .

CPUs also maintain an L1 cache. In CPUs, that cache is fully hardware-managed. In GPUs that cache is mostly programmer-managed, even in high-level languages like CUDA C .

Each L1 data cache in an each of an H100's SMs can store 256 KiB (2,097,152 bits). Across the 132 SMs in an H100 SXM 5, that's 33 MiB (242,221,056 bits) of cache space.

Something seem wrong?
Or want to contribute?
Email: glossary@modal.com