A10 vs. A100 vs. H100 - Which one should you choose?

Solutions Engineer

This article will guide you through the key differences between NVIDIA’s A10, A100, and H100 GPUs, helping you make an informed decision based on your specific needs and budget.

GPU comparison

Let’s start with a comparison of the GPUs available on Modal:

GPU Type	VRAM (GiB)	Memory bandwidth (VRAM-to-SRAM, TB/s)	Price (on Modal, $ / hour)	Architecture
H100	80	3.35	4.56	Hopper
A100 (80GB)	80	2	3.40	Ampere
A100 (40GB)	40	2	2.78	Ampere
A10	24	0.6	1.10	Ampere
L4	24	0.3	0.80	Lovelace
T4	16	0.3	0.59	Tesla

VRAM is high speed, byte-addressable memory located on your graphics card. It plays the same role in the GPU’s memory system as the RAM plays in your CPU’s. The more VRAM, the larger the models you can run.
In the table above, we show the VRAM-to-SRAM memory bandwidth, which is the rate at which data can be transferred between the GPU’s main memory (VRAM, typically GDDR or HBM) and its on-chip cache memory (SRAM). This bandwidth is crucial for the GPU’s ability to quickly bring model parameters into the compute cores where activations and outputs are calculated.

H100

Best for: Training and inference for very large models (70B parameters or more), transformer-based architectures, low (8-bit) precision
Key features:
- Most powerful NVIDIA datacenter GPU that’s generally available at time of writing (2025)
- ~2x faster than A100 for most workloads, but also harder to get (might have to queue), and more expensive
- Optimized for large language model workloads. It offers over 3 TB/s of memory bandwidth, which is crucial for LLM inference workloads that require rapid data transfer between VRAM and compute cores.
- Contains specialized compute units for lower precision (FP8) operations

A100

Best for: Training and inference for large models (7B-70B parameters)
Key features:
- NVIDIA’s workhorse GPU, meant for AI, data analytics, and HPC workloads
- Available in 40GB and 80GB variants
- Because memory bandwidth has scaled more slowly than arithmetic bandwidth, A100s can be more cost-effective than H100s for workloads that are memory-bound, like running large models on small batches

A10

Best for: Inference for small to medium models (7B parameters or less, like most diffusion-based image generation models), cost-effective, small-scale training for smaller models
Key features:
- Same architecture as A100, so most code that runs on A100 will run on A10
- Good performance-to-cost ratio for smaller workloads

L4

Best for: Inference for small to medium size models (7B parameters or less, like most diffusion-based image generation models)
Key features:
- Cost-efficient GPU, but still very capable
- L4 has the same amount of VRAM as A10, but only half the memory bandwidth
- L4 offers 2x-4x better performance over and is newer than T4

T4

Best for:
- Inference for small models
Key features:
- T4 is older and slower than L4
- Offered for free with Google Colab, so good for small-scale experimentation and prototyping. For example, you can start with T4s on Colab, and run the same code in prod on L4s or A10s.

Choosing the right GPU

When selecting a GPU for your machine learning, first gather the following information:

Task Type: Are you training, fine-tuning, or running inference?
Model Size: How many parameters does your model have?
Memory Requirements: How much VRAM does your model need?
Budget: What’s your cost constraint per hour of computation?
Performance Needs: Do you require the absolute fastest processing times?

Then follow this procedure to decide which GPU is the best fit:

Calculate the amount of memory that you need, depending on your use case and model size. Remember to take into account whether you are quantizing the models and/or using techniques like LoRA or QLoRA. You can refer to our VRAM guides for more information on how to calculate the memory requirements:
- VRAM guide for inference
- VRAM guide for training/fine-tuning
Check against the table above for the most cost-effective GPU that the model will fit on
Start with the most cost-effective GPU to see whether the model runs/performs well and move to the more expensive ones if it doesn’t.

Advanced considerations

Multi-GPU Setups: For some super large models (greater than 100B parameters, like Llama3-405B), you may need to allocate more than a single even top-tier GPU. Modal’s platform makes it easy to scale up your GPU resources as needed.

Conclusion

At Modal, we offer flexible access to all these GPU types with a simple gpu="A100" or gpu="H100" flag in your code. This allows you to easily switch between GPUs based on your needs without worrying about hardware procurement or maintenance.

Ready to supercharge your AI workloads with the right GPU? Sign up for Modal today and experience the difference firsthand!