Kernel
A kernel is the unit of CUDA code that programmers typically write and compose, akin to a procedure or function in typical languages targeting CPUs.
Unlike procedures, a kernel is called ("launched") once and returns once, but is executed many times, once each by a number of threads . These executions are generally concurrent (their execution order is non-deterministic) and parallel (they occur simultaneously on different execution units).
The collection of all threads executing a kernel is organized as a kernel grid — aka a thread block grid , the highest level of the CUDA programming model 's thread hierarchy. A kernel grid executes across multiple Streaming Multiprocessors (SMs) and so operates at the scale of the entire GPU. The matching level of the memory hierarchy is the global memory .
In CUDA C++ , kernels are passed pointers to global memory on the device when they are invoked by the host and return nothing — they just mutate memory.