Install Flash Attention on Modal
FlashAttention is an optimized CUDA library for Transformer scaled-dot-product attention. Dao AI Lab now publishes pre-compiled wheels, which makes installation quick. This script shows how to
- Pin an exact wheel that matches CUDA 12 / PyTorch 2.6 / Python 3.13.
- Build a Modal image that installs torch, numpy, and FlashAttention.
- Launch a GPU function to confirm the kernel runs on a GPU.
You need to specify an exact release wheel. You can find more on their github.
And here is a demo verifying that it works: