Thread
A thread of execution (or "thread" for short) is the lowest unit of programming for GPUs, the atom of the CUDA programming model 's thread group hierarchy. A thread has its own registers , but little else.
Both SASS and PTX programs target threads. Compare this to a typical C program in a POSIX environment, which targets a process, itself a collection of one or more threads.
Like a thread on a CPU, a GPU thread can have a private instruction pointer/program counter. However, for performance reasons, GPU programs are generally written so that all the threads in a warp share the same instruction pointer, executing instructions in lock-step (see also Warp Scheduler ).
Also like threads on CPUs, GPU threads have stacks in global memory for storing spilled registers and a function call stack, but high-performance kernels generally avoid using either.
A single CUDA Core executes instructions from a single thread.