GPU Glossary
/device-software/kernel

Kernel

A single kernel launch corresponds to a thread block grid in the CUDA programming model . Modified from diagrams in NVIDIA's CUDA Refresher: The CUDA Programming Model and the NVIDIA CUDA C++ Programming Guide .

A kernel is the unit of CUDA code that programmers typically write and compose, akin to a procedure or function in typical languages targeting CPUs.

Unlike procedures, a kernel is called ("launched") once and returns once, but is executed many times, once each by a number of threads . These executions are generally concurrent (their execution order is non-deterministic) and parallel (they occur simultaneously on different execution units).

The collection of all threads executing a kernel is organized as a kernel grid — aka a thread block grid , the highest level of the CUDA programming model 's thread hierarchy. A kernel grid executes across multiple Streaming Multiprocessors (SMs) and so operates at the scale of the entire GPU. The matching level of the memory hierarchy is the global memory .

In CUDA C++ , kernels are passed pointers to global memory on the device when they are invoked by the host and return nothing — they just mutate memory.

Something seem wrong?
Or want to contribute?
Email: glossary@modal.com