What is SM utilization?
SM utilization measures the percentage of time that Streaming Multiprocessors (SMs) are executing instructions.
SM utilization is akin to the more familiar
kernel utilization reported by
nvidia-smi
, but more fine-grained.
Instead of reporting the fraction of time that a
kernel is executing anywhere on the GPU,
it reports the fraction of time all
SMs spend executing
kernels . If a
kernel uses only one
SM , e.g. because it
only has one thread block , then it
will achieve 100% GPU utilization while it is active, but the SM utilization
will be at most one over the number of
SMs — under 1% in an
H100 GPU.
As with GPU utilization but unlike CPU utilization , SM utilization should be high, even up to 100%.
But even though SM utilization is finer-grained than GPU utilization, it still isn't fine-grained enough to capture how well the GPU's compute resources are being used. If SM utilization is high, but performance is still inadequate, programmers should check pipe utilization , which measures how effectively each SM uses its internal functional units. High SM utilization with low pipe utilization indicates that your kernel is running on many SMs but not fully utilizing the computational resources within each one.
Or want to contribute?
Click this button to
let us know on GitHub.