GPU Glossary
GPU Glossary
/perf/register-pressure

What is register pressure?

Register pressure is a colorful term used when the register file is a bottleneck .

Registers in the Parallel Thread eXecution (PTX) language are virtual and unlimited, but the register files of the Streaming Multiprocessor (SM) are physical and so limited.

The amount of space in the register file consumed by a thread is determined by the Streaming ASSembler (SASS) code for the kernel , and since all threads in a thread block are scheduled onto the same SM , the total space required by a thread block is determined also by the kernel launch configuration. As the space allocated per thread block increases, fewer thread blocks can be scheduled onto the same SM , reducing occupancy and making it more difficult to hide latency .

See this excellent article by SemiAnalysis for an account of the relationship between register pressure and key features added in recent Streaming Multiprocessor architectures , like asynchronous copies (added in Ampere), the Tensor Memory Accelerator (TMA, added in Hopper), and tensor memory (added in Blackwell).

Register pressure also occurs in CPUs, where similar register bottlenecks limit the degree to which loops can be strip-mined during auto-vectorization .

Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.