September 1, 20243 minute read
How much VRAM do I need for LLM model fine-tuning?
author
Yiren Lu@YirenLu
Solutions Engineer

Fine-tuning Large Language Models (LLMs) can be computationally intensive, with GPU memory (VRAM) often being the primary bottleneck. This is a guide to calculating the VRAM requirements for fine-tuning.

The rule of thumb

For full fine-tuning of LLMs loaded in “half-precision” (16 bits), a quick rule of thumb is:

16GB of GPU memory per 1B parameters in the model

This is significantly higher than the 2GB per 1B parameters needed for inference, due to the additional memory required for optimizer states, gradients, and other training-related data.

VRAM Requirements for fine-tuning a 7B model

Let’s walk through a VRAM estimation for a 7B parameter model. The total VRAM requirements are the sum of the following individual components:

1. Model parameters

  • Full precision (FP32): ~28GB (7B x 4 bytes)
  • Half precision (FP16): ~14GB (7B x 2 bytes)
  • Mixed precision: ~21GB (FP16 + partial FP32 copy)

2. Optimizer States

Using AdamW (most common optimizer):

  • ~84GB (3 copies at 4 bytes/parameter)

Using 8-bit optimizers (e.g., bitsandbytes):

  • ~42GB (1 FP32 copy + 2 8-bit copies)

3. Gradients

  • FP32: ~28GB
  • FP16: ~14GB (often matches model weight precision)

4. Activations

Depends on batch size, sequence length, and model architecture. Can be reduced with activation checkpointing. For most contemporary model types (transformers, diffusion models), activations don’t add nearly as much memory requirements as the other components above.

Total VRAM Estimate

For full fine-tuning with half precision and using 8-bit optimizers, a 7B model might require 14+42+14 = ~70GB of VRAM.

Efficient fine-tuning: LoRA and QLoRA

Techniques like LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) significantly reduce VRAM requirements. You can read more about how in our LoRA vs. QLoRA article.

VRAM requirements table

Here’s a comparison of VRAM requirements for different model sizes and fine-tuning techniques:

Method Precision 7B 13B 30B 70B 110B
Full 16 67GB 125GB 288GB 672GB 1056GB
LoRA 16 15GB 28GB 63GB 146GB 229GB
QLoRA 8 9GB 17GB 38GB 88GB 138GB
QLoRA 4 5GB 9GB 20GB 46GB 72GB

Note: These are approximate values and actual usage may vary based on implementation details and additional memory needs during training.

Ship your first app in minutes.

Get Started

$30 / month free compute