GPU Glossary
GPU Glossary
/device-software/streaming-assembler

What is Streaming Assembler?

SASS

Streaming ASSembler (SASS) is the assembly format for programs running on NVIDIA GPUs. This is the lowest-level format in which human-readable code can be written. It is one of the formats output by nvcc, the NVIDIA CUDA Compiler Driver , alongside PTX . It is converted to device-specific binary microcodes during execution. Presumably, the "Streaming" in "Streaming Assembler" refers to the Streaming Multiprocessors which the assembly language programs.

SASS is versioned and tied to a specific NVIDIA GPU SM architecture . See also Compute Capability .

Some exemplary instructions in SASS for the SM90a architecture of Hopper GPUs:

  • FFMA R0, R7, R0, 1.5 ; - perform a Fused Floating point Multiply Add that multiplies the contents of Register 7 and Register 0, adds 1.5, and stores the result in Register 0.
  • S2UR UR4, SR_CTAID.X ; - copy the X value of the Cooperative Thread Array 's InDex from its Special Register to Uniform Register 4.

Even more so than for CPUs, writing this "GPU assembler" by hand is very uncommon. Viewing compiler-generated SASS while profiling and editing high-level CUDA C/C++ code or in-line PTX is more common , especially in the production of the highest-performance kernels. Viewing CUDA C/C++ , SASS, and PTX together is supported on Godbolt . For more detail on SASS with a focus on performance debugging workflows, see this talk from Arun Demeure.

SASS is very lightly documented — the instructions are listed in the documentation for NVIDIA's CUDA binary utilities , but their semantics are not defined. The mapping from ASCII assembler to binary opcodes and operands is entirely undocumented, but it has been reverse-engineered in certain cases (Maxwell , Lovelace ).

Something seem wrong?
Or want to contribute?

Click this button to
let us know on GitHub.