How much VRAM do I need for LLM model fine-tuning?

Solutions Engineer

Fine-tuning Large Language Models (LLMs) can be computationally intensive, with GPU memory (VRAM) often being the primary bottleneck. This is a guide to calculating the VRAM requirements for fine-tuning.

The rule of thumb

For full fine-tuning of LLMs loaded in “half-precision” (16 bits), a quick rule of thumb is:

16GB of GPU memory per 1B parameters in the model

This is significantly higher than the 2GB per 1B parameters needed for inference, due to the additional memory required for optimizer states, gradients, and other training-related data.

VRAM Requirements for fine-tuning a 7B model

Let’s walk through a VRAM estimation for a 7B parameter model. The total VRAM requirements are the sum of the following individual components:

1. Model parameters

Full precision (FP32): ~28GB (7B x 4 bytes)
Half precision (FP16): ~14GB (7B x 2 bytes)
Mixed precision: ~21GB (FP16 + partial FP32 copy)

2. Optimizer States

Using AdamW (most common optimizer):

~84GB (3 copies at 4 bytes/parameter)

Using 8-bit optimizers (e.g., bitsandbytes):

~42GB (1 FP32 copy + 2 8-bit copies)

3. Gradients

FP32: ~28GB
FP16: ~14GB (often matches model weight precision)

4. Activations

Depends on batch size, sequence length, and model architecture. Can be reduced with activation checkpointing. For most contemporary model types (transformers, diffusion models), activations don’t add nearly as much memory requirements as the other components above.

Total VRAM Estimate

For full fine-tuning with half precision and using 8-bit optimizers, a 7B model might require 14+42+14 = ~70GB of VRAM.

Efficient fine-tuning: LoRA and QLoRA

Techniques like LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) significantly reduce VRAM requirements. You can read more about how in our LoRA vs. QLoRA article.

VRAM requirements table

Here’s a comparison of VRAM requirements for different model sizes and fine-tuning techniques:

Method	Precision	7B	13B	30B	70B	110B
Full	16	67GB	125GB	288GB	672GB	1056GB
LoRA	16	15GB	28GB	63GB	146GB	229GB
QLoRA	8	9GB	17GB	38GB	88GB	138GB
QLoRA	4	5GB	9GB	20GB	46GB	72GB

Note: These are approximate values and actual usage may vary based on implementation details and additional memory needs during training.