Or want to contribute?
Click this button to
let us know on GitHub.
What is a Cooperative Thread Array?
A cooperative thread array (CTA) is a collection of threads scheduled onto the same Streaming Multiprocessor (SM) . CTAs are the PTX /SASS implementation of the CUDA programming model 's thread blocks . CTAs are composed of one or more warps .
Programmers can direct threads within a CTA to coordinate with each other. The programmer-managed shared memory , in the L1 data cache of the SMs , makes this coordination fast. Threads in different CTAs cannot coordinate with each other via barriers, unlike threads within a CTA, and instead must coordinate via global memory , e.g. via atomic update instructions. Due to driver control over the scheduling of CTAs at runtime, CTA execution order is indeterminate and blocking a CTA on another CTA can easily lead to deadlock.
The number of CTAs that can be scheduled onto a single SM sets the achievable occupancy and depends on a number of factors. Fundamentally, the SM has a limited set of resources — lines in the register file , "slots" for warps , bytes of shared memory in the L1 data cache — and each CTA uses a certain amount of those resources (as calculated at compile time) when scheduled onto an SM .
Building on GPUs? We know a thing or two about it.
Modal is an ergonomic Python SDK wrapped around a global GPU fleet.Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.
Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.