Startups get up to $50k in free compute credits.
September 1, 20243 minute read
How much VRAM do I need for LLM model fine-tuning?
author
Yiren Lu@YirenLu
Solutions Engineer

Fine-tuning Large Language Models (LLMs) can be computationally intensive, with GPU memory (VRAM) often being the primary bottleneck. This is a guide to calculating the VRAM requirements for fine-tuning.

The rule of thumb

For full fine-tuning of LLMs loaded in “half-precision” (16 bits), a quick rule of thumb is:

16GB of GPU memory per 1B parameters in the model

This is significantly higher than the 2GB per 1B parameters needed for inference, due to the additional memory required for optimizer states, gradients, and other training-related data.

VRAM Requirements for fine-tuning a 7B model

Let’s walk through a VRAM estimation for a 7B parameter model. The total VRAM requirements are the sum of the following individual components:

1. Model parameters

  • Full precision (FP32): ~28GB (7B x 4 bytes)
  • Half precision (FP16): ~14GB (7B x 2 bytes)
  • Mixed precision: ~21GB (FP16 + partial FP32 copy)

2. Optimizer States

Using AdamW (most common optimizer):

  • ~84GB (3 copies at 4 bytes/parameter)

Using 8-bit optimizers (e.g., bitsandbytes):

  • ~42GB (1 FP32 copy + 2 8-bit copies)

3. Gradients

  • FP32: ~28GB
  • FP16: ~14GB (often matches model weight precision)

4. Activations

Depends on batch size, sequence length, and model architecture. Can be reduced with activation checkpointing. For most contemporary model types (transformers, diffusion models), activations don’t add nearly as much memory requirements as the other components above.

Total VRAM Estimate

For full fine-tuning with half precision and using 8-bit optimizers, a 7B model might require 14+42+14 = ~70GB of VRAM.

Efficient fine-tuning: LoRA and QLoRA

Techniques like LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) significantly reduce VRAM requirements. You can read more about how in our LoRA vs. QLoRA article.

VRAM requirements table

Here’s a comparison of VRAM requirements for different model sizes and fine-tuning techniques:

MethodPrecision7B13B30B70B110B
Full1667GB125GB288GB672GB1056GB
LoRA1615GB28GB63GB146GB229GB
QLoRA89GB17GB38GB88GB138GB
QLoRA45GB9GB20GB46GB72GB

Note: These are approximate values and actual usage may vary based on implementation details and additional memory needs during training.

Modal CTA

Ship your first app in minutes.

Get Started

$30 / month free compute