Fine-tuning Large Language Models (LLMs) can be computationally intensive, with GPU memory (VRAM) often being the primary bottleneck. This is a guide to calculating the VRAM requirements for fine-tuning.
The rule of thumb
For full fine-tuning of LLMs loaded in “half-precision” (16 bits), a quick rule of thumb is:
16GB of GPU memory per 1B parameters in the model
This is significantly higher than the 2GB per 1B parameters needed for inference, due to the additional memory required for optimizer states, gradients, and other training-related data.
VRAM Requirements for fine-tuning a 7B model
Let’s walk through a VRAM estimation for a 7B parameter model. The total VRAM requirements are the sum of the following individual components:
1. Model parameters
- Full precision (FP32): ~28GB (7B x 4 bytes)
- Half precision (FP16): ~14GB (7B x 2 bytes)
- Mixed precision: ~21GB (FP16 + partial FP32 copy)
2. Optimizer States
Using AdamW (most common optimizer):
- ~84GB (3 copies at 4 bytes/parameter)
Using 8-bit optimizers (e.g., bitsandbytes
):
- ~42GB (1 FP32 copy + 2 8-bit copies)
3. Gradients
- FP32: ~28GB
- FP16: ~14GB (often matches model weight precision)
4. Activations
Depends on batch size, sequence length, and model architecture. Can be reduced with activation checkpointing. For most contemporary model types (transformers, diffusion models), activations don’t add nearly as much memory requirements as the other components above.
Total VRAM Estimate
For full fine-tuning with half precision and using 8-bit optimizers, a 7B model might require 14+42+14 = ~70GB of VRAM.
Efficient fine-tuning: LoRA and QLoRA
Techniques like LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) significantly reduce VRAM requirements. You can read more about how in our LoRA vs. QLoRA article.
VRAM requirements table
Here’s a comparison of VRAM requirements for different model sizes and fine-tuning techniques:
Method | Precision | 7B | 13B | 30B | 70B | 110B |
---|---|---|---|---|---|---|
Full | 16 | 67GB | 125GB | 288GB | 672GB | 1056GB |
LoRA | 16 | 15GB | 28GB | 63GB | 146GB | 229GB |
QLoRA | 8 | 9GB | 17GB | 38GB | 88GB | 138GB |
QLoRA | 4 | 5GB | 9GB | 20GB | 46GB | 72GB |
Note: These are approximate values and actual usage may vary based on implementation details and additional memory needs during training.