GPU Glossary
/device-hardware/tensor-core

Tensor Core

Tensor Cores are GPU cores that operate on entire matrices with each instruction.

The internal architecture of an H100 SM. Note the larger size and lower number of Tensor Cores. Modified from NVIDIA's H100 white paper .

For example, the mma PTX instructions (documented here ) calculate D = AB + C for matrices A, B, C, and D. Operating on more data for a single instruction fetch dramatically reduces power requirements (see this talk by Bill Dally, Chief Scientist at NVIDIA).

Tensor Cores are much larger and less numerous than CUDA Cores. An H100 SXM5 has only four Tensor Cores per Streaming Multiprocessor , to compared to hundreds of CUDA Cores .

Tensor Cores were introduced in the V100 GPU, which represented a major improvement in the suitability of NVIDIA GPUs for large neural network worloads. For more, see the NVIDIA white paper introducing the V100 .

Something seem wrong?
Or want to contribute?
Email: glossary@modal.com