A thread of execution (or "thread" for short) is the lowest unit of programming for GPUs, the atom of the CUDA programming model 's thread group hierarchy. A thread has its own registers , but little else.

Both SASS and PTX programs target threads. Compare this to a typical C program in a POSIX environment, which targets a process, itself a collection of one or more threads.

Like a thread on a CPU, a GPU thread can have a private instruction pointer/program counter. However, for performance reasons, GPU programs are generally written so that all the threads in a warp share the same instruction pointer, executing instructions in lock-step (see also Warp Scheduler ).

Also like threads on CPUs, GPU threads have stacks in global memory for storing spilled registers and a function call stack, but high-performance kernels generally avoid using either.

A single CUDA Core executes instructions from a single thread.

Compute Capability

Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.

Warp ?