GPU Glossary
GPU Glossary
/perf/branch-efficiency

What is branch efficiency?

Branch efficiency measures how often all threads in a warp take the same execution path when encountering conditional statements.

Branch efficiency is calculated as the ratio of uniform control flow decisions to total branch instructions executed. Control flow uniformity is measured at the level of warps , and so branch efficiency indicates the absence of warp divergence .

Not all conditionals reduce branch efficiency. The common "bounds-check" fragment that appears in most kernels

cpp
int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n)

will generally have very high branch efficiency, since most warps will be composed of threads that all have the same value for the conditional, save for a single warp whose threads ' indices are above and below n.

While CPUs also care about the uniformity of branching behavior, they tend to care primarily about uniformity of branch behavior over time, as part of hardware-controlled branch prediction and speculative execution. That is, as circuits within the CPU accumulate data about a branch as it is encountered multiple times during program execution, the performance should improve.

GPUs instead care about uniformity in space. That is, uniformity is measured within warps , whose threads execute concurrently in time but are mapped onto distinct data, and performance improves if those threads branch uniformly.

Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.