Or want to contribute?
Click this button to
let us know on GitHub.
What is Streaming Assembler?
Streaming ASSembler
(SASS) is the assembly format for programs running on NVIDIA GPUs. This is the
lowest-level format in which human-readable code can be written. It is one of
the formats output by nvcc, the
NVIDIA CUDA Compiler Driver , alongside
PTX . It is converted
to device-specific binary microcodes during execution. Presumably, the
"Streaming" in "Streaming Assembler" refers to the
Streaming Multiprocessors
which the assembly language programs.
SASS is versioned and tied to a specific NVIDIA GPU SM architecture . See also Compute Capability .
Some exemplary instructions in SASS for the SM90a architecture of Hopper GPUs:
FFMA R0, R7, R0, 1.5 ;- perform aFusedFloating pointMultiplyAdd that multiplies the contents ofRegister 7 andRegister 0, adds1.5, and stores the result inRegister 0.S2UR UR4, SR_CTAID.X ;- copy theXvalue of the Cooperative Thread Array 'sInDex from itsSpecialRegister toUniformRegister 4.
Even more so than for CPUs, writing this "GPU assembler" by hand is very uncommon. Viewing compiler-generated SASS while profiling and editing high-level CUDA C/C++ code or in-line PTX is more common , especially in the production of the highest-performance kernels. Viewing CUDA C/C++ , SASS, and PTX together is supported on Godbolt . For more detail on SASS with a focus on performance debugging workflows, see this talk from Arun Demeure.
SASS is very lightly documented — the instructions are listed in the documentation for NVIDIA's CUDA binary utilities , but their semantics are not defined. The mapping from ASCII assembler to binary opcodes and operands is entirely undocumented, but it has been reverse-engineered in certain cases (Maxwell , Lovelace ).
Building on GPUs? We know a thing or two about it.
Modal is an ergonomic Python SDK wrapped around a global GPU fleet. Deploy serverless AI workloads instantly without worrying about quota requests, driver compatibility issues, or managing bulky ML dependencies.