How much is an Nvidia H200?

Solutions Engineer

The Nvidia H200, which began delivery in late 2024, is Nvidia’s latest and most powerful GPU for AI workloads, featuring significantly more memory than its predecessor, the H100. It’s particularly well-suited for running the latest large language models like DeepSeek.

How is the H200 different from the H100?

The key differentiator of the H200 is its massive 141GB of memory, which is nearly double the capacity of the H100.

The H200 also offers:

Higher memory bandwidth (4.8 TB/s vs 3.35 TB/s on H100)
Up to 1.6 times higher inference performance for LLMs like GPT-3 and Llama-70B in specific scenarios

This additional VRAM and bandwidth makes the H200 particularly well-suited for:

Running larger (100+B parameter) AI models that won’t fit in H100 memory
Handling longer context windows in LLMs
Processing larger batch sizes for improved throughput

Recommended Hardware for Modern AI Models

The H200’s expanded memory capacity makes it the ideal choice for running the latest generation of large language models. For example:

DeepSeek Models: You can run the full DeepSeek-R1 671B model on 8xH200s. You can run distilled versions of the model on a single H200.
Multi-Modal Models: Models that process both text and images require significant VRAM, making the H200 particularly valuable
Fine-tuning: The additional memory allows for fine-tuning larger models or using larger batch sizes.

Direct Purchase Price

The H200 GPU costs ~$30,000 per chip if you are purchasing it directly from a hardware vendor.

However, it’s important to note that organizations typically aren’t buying just a single chip; they may be investing in configurations like a Nvidia DGX H200 supercomputer with 8 H200s for ~$300k.

Alternatives to Direct Purchase: GPU-on-demand Platforms

Given the substantial cost and limited availability of H200 GPUs, most organizations will probably access H200 GPUs via cloud service providers and AI infrastructure companies.

Cloud service providers with partnerships with Nvidia include:

Amazon Web Services (AWS)
- P5en instances with 8xH200s cost $84.8/hr
Google Cloud
- A3 Ultra VMs are available through Google Kubernetes Engine (GKE)
Microsoft Azure
- ND v5 H200 series VM
Oracle Cloud Infrastructure

The cost of an H200 GPU across the major cloud providers is roughly $10/GPU/hour.

There’s also a newer generation of GPU-on-demand platforms that are just beginning to offer (often limited) H200 access, including Modal, RunPod, CoreWeave, and Lambda Labs.

H100 vs. H200

Given that availability of H200s is still fairly limited, and they are generally priced at a premium (they cost ~$10/GPU/hour, compared to ~$5/GPU/hour for H100s), are they worth it?

The answer is that if you are training or running inference on larger models (> 70B parameters), then the H200 probably makes sense.

For example, running Llama-70B on a single H200 at 8-bit precision is 1.9x more performant than the H100.

On the other hand, for most inference workloads, there will be little to no performance gain over H100s.