GPU Glossary
GPU Glossary
/perf/performance-bottleneck

What is a performance bottleneck?

The literal neck of a bottle limits the rate at which liquid can be poured; a metaphorical performance bottleneck in a system limits the rate at which tasks can be completed.

Bottlenecks are the target of performance optimization. The textbook approach to optimization is to

  • determine the bottleneck,
  • elevate the bottleneck until it is no longer such, and
  • repeat on the new bottleneck.

This approach is formalized in, for instance, the "Theory of Constraints" by Eliyahu Goldratt that helped transmit the Toyota approach to manufacturing to manufacturers worldwide , thence to software engineering and operations .

In this talk for Jane Street , Horace He broke down the work done by the kernels of programs run on GPUs into three categories:

And so for GPU kernels , performance bottlenecks fall into three main* categories:

Roofline model analysis helps quickly identify whether a program's performance is bottlenecked by compute/arithmetic bandwidth or memory bandwidth .

Of course, any resource can become a bottleneck. For instance, power ingress and heat egress can and does bottleneck some GPUs below their theoretical maximum performance. See this article from NVIDIA explaining a 4% end-to-end performance improvement by redirecting power from the L2 cache to the Streaming Multiprocessors or this article from Horace He indicating that matrix multiplication performance varies depending on the input data via the amount of power demanded by transistor switching. But compute and memory are the most important resources and the most common bottlenecks.

Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.