GPU Glossary
GPU Glossary
/device-software/thread
?
Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.

What is a CUDA Thread?

A thread of execution (or "thread" for short) is the lowest unit of programming for GPUs, the base and atom of the CUDA programming model 's thread hierarchy . A thread has its own registers , but little else.

Both SASS and PTX programs target threads. Compare this to a typical C program in a POSIX environment, which targets a process, itself a collection of one or more threads. Unlike POSIX threads, CUDA threads are not used to make syscalls.

Like a thread on a CPU, a GPU thread can have a private instruction pointer/program counter. However, for performance reasons, GPU programs are generally written so that all the threads in a warp share the same instruction pointer, executing instructions in lock-step (see also Warp Scheduler ).

Also like threads on CPUs, GPU threads have stacks in global memory for storing spilled registers and a function call stack, but high-performance kernels generally limit use of both.

A single CUDA Core executes instructions from a single thread.

Modal LogoBuilding on GPUs? We know a thing or two about it.

Modal is an ergonomic Python SDK wrapped around a global GPU fleet. Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.

Deploy on GPUs