Instructions in the Parallel Thread Execution instruction set are compatible with only certain physical GPUs. The versioning system used to abstract away details of physical GPUs from the instruction set and compiler is called "Compute Capability".

Most compute capability version numbers have two components: a major version and a minor version. NVIDIA promises forward compatibility (old PTX code runs on new GPUs) across both major and minor versions following the onion layer model.

With Hopper, NVIDIA has introduced an additional version suffix, the a in 9.0a, which includes features that deviate from the onion model: their future support is not guaranteed.

Target compute capabilities for PTX compilation can be specified when invoking nvcc, the NVIDIA CUDA Compiler Driver . By default, the compiler will also generate optimized SASS for the matching Streaming Multiprocessor (SM) architecture . The documentation for nvcc refers to compute capability as a "virtual GPU architecture", in contrast to the "physical GPU architecture" expressed by the SM version.

The technical specifications for each compute capability version can be found in the Compute Capability section of the NVIDIA CUDA C Programming Guide .

Parallel Thread eXecution

Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.

Thread ?