GPU Glossary
/device-software/thread

Thread

Threads are the lowest level of the thread group hierarchy (top, left) and are mapped onto the cores of a Streaming Multiprocessor . Modified from diagrams in NVIDIA's CUDA Refresher: The CUDA Programming Model and the NVIDIA CUDA C++ Programming Guide .

A thread of execution (or "thread" for short) is the lowest unit of programming for GPUs, the atom of the CUDA programming model 's thread group hierarchy. A thread has its own registers , but little else.

Both SASS and PTX programs target threads. Compare this to a typical C program in a POSIX environment, which targets a process, itself a collection of one or more threads.

Like a thread on a CPU, a GPU thread can have a private instruction pointer/program counter. However, for performance reasons, GPU programs are generally written so that all the threads in a warp share the same instruction pointer, executing instructions in lock-step (see also Warp Scheduler ).

Also like threads on CPUs, GPU threads have stacks in global memory for storing spilled registers and a function call stack, but high-performance kernels generally avoid using either.

A single CUDA Core executes instructions from a single thread.

Something seem wrong?
Or want to contribute?
Email: glossary@modal.com