Install Flash Attention on Modal

FlashAttention is an optimized CUDA library for Transformer scaled-dot-product attention. Dao AI Lab now publishes pre-compiled wheels, which makes installation quick. This script shows how to

  1. Pin an exact wheel that matches CUDA 12 / PyTorch 2.6 / Python 3.13.
  2. Build a Modal image that installs torch, numpy, and FlashAttention.
  3. Launch a GPU function to confirm the kernel runs on a GPU.

You need to specify an exact release wheel. You can find more on their github.

And here is a demo verifying that it works: