Or want to contribute?
Click this button to
let us know on GitHub.
What is a CUDA Thread?
A thread of execution (or "thread" for short) is the lowest unit of programming for GPUs, the base and atom of the CUDA programming model 's thread hierarchy . A thread has its own registers , but little else.
Both SASS and PTX programs target threads. Compare this to a typical C program in a POSIX environment, which targets a process, itself a collection of one or more threads. Unlike POSIX threads, CUDA threads are not used to make syscalls.
Like a thread on a CPU, a GPU thread can have a private instruction pointer/program counter. However, for performance reasons, GPU programs are generally written so that all the threads in a warp share the same instruction pointer, executing instructions in lock-step (see also Warp Scheduler ).
Also like threads on CPUs, GPU threads have stacks in global memory for storing spilled registers and a function call stack, but high-performance kernels generally limit use of both.
A single CUDA Core executes instructions from a single thread.
Building on GPUs? We know a thing or two about it.
Modal is an ergonomic Python SDK wrapped around a global GPU fleet.Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.
Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.