What is Streaming Assembler?
Streaming ASSembler
(SASS) is the assembly format for programs running on NVIDIA GPUs. This is the
lowest-level format in which human-readable code can be written. It is one of
the formats output by nvcc, the
NVIDIA CUDA Compiler Driver , alongside
PTX . It is converted
to device-specific binary microcodes during execution. Presumably, the
"Streaming" in "Streaming Assembler" refers to the
Streaming Multiprocessors
which the assembly language programs.
SASS is versioned and tied to a specific NVIDIA GPU SM architecture . See also Compute Capability .
Some exemplary instructions in SASS for the SM90a architecture of Hopper GPUs:
FFMA R0, R7, R0, 1.5 ;- perform aFusedFloating pointMultiplyAdd that multiplies the contents ofRegister 7 andRegister 0, adds1.5, and stores the result inRegister 0.S2UR UR4, SR_CTAID.X ;- copy theXvalue of the Cooperative Thread Array 'sInDex from itsSpecialRegister toUniformRegister 4.
Even more so than for CPUs, writing this "GPU assembler" by hand is very uncommon. Viewing compiler-generated SASS while profiling and editing high-level CUDA C/C++ code or in-line PTX is more common , especially in the production of the highest-performance kernels. Viewing CUDA C/C++ , SASS, and PTX together is supported on Godbolt . For more detail on SASS with a focus on performance debugging workflows, see this talk from Arun Demeure.
SASS is very lightly documented — the instructions are listed in the documentation for NVIDIA's CUDA binary utilities , but their semantics are not defined. The mapping from ASCII assembler to binary opcodes and operands is entirely undocumented, but it has been reverse-engineered in certain cases (Maxwell , Lovelace ).
Or want to contribute?
Click this button to
let us know on GitHub.