Streaming ASSembler
Streaming ASSembler
(SASS) is the assembly format for programs running on NVIDIA GPUs. This is the
lowest-level format in which human-readable code can be written. It is one of
the formats output by nvcc
, the
NVIDIA CUDA Compiler Driver , alongside
PTX . It is converted
to device-specific binary microcodes during execution. Presumably, the
"Streaming" in "Streaming Assembler" refers to the
Streaming Multiprocessors
which the assembly language programs.
SASS is versioned and tied to a specific NVIDIA GPU SM architecture . See also Compute Capability .
Some exemplary instructions in SASS for the SM90a architecture of Hopper GPUs:
FFMA R0, R7, R0, 1.5 ;
- perform aF
usedF
loating pointM
ultiplyA
dd that multiplies the contents ofR
egister 7 andR
egister 0, adds1.5
, and stores the result inR
egister 0.S2UR UR4, SR_CTAID.X ;
- copy theX
value of the Cooperative Thread Array 'sI
nD
ex from itsS
pecialR
egister toU
niformR
egister 4.
As for CPUs, writing this "GPU assembler" by hand is very uncommon. Viewing compiler-generated SASS while profiling and editing high-level CUDA C/C++ code or in-line PTX is more common , especially in the production of the highest-performance kernels. Viewing CUDA C/C++ , SASS, and PTX together is supported on Godbolt . For more detail on SASS with a focus on performance debugging workflows, see this talk from Arun Demeure.
SASS is very lightly documented — the instructions are listed in the documentation for NVIDIA's CUDA binary utilities , but their semantics are not defined. The mapping from ASCII assembler to binary opcodes and operands is entirely undocumented, but it has been reverse-engineered in certain cases (Maxwell , Lovelace ).