Check out our new GPU Glossary! Read now
August 15, 20245 minute read
How much is an Nvidia H100?
author
Yiren Lu@YirenLu
Solutions Engineer

Direct purchase price from Nvidia

When purchasing directly from Nvidia, the H100 GPU is estimated to cost around $25,000 per GPU. However, it’s important to note that these prices can vary based on factors such as volume discounts and specific configurations.

For example, a full H100 GPU system, which includes multiple H100 chips, can cost up to $400,000.

Alternatives to direct purchase: GPU-on-demand platforms

Given the high cost and limited availability of H100 GPUs, many companies are exploring alternatives through GPU-on-demand platforms. These services offer flexible access to high-performance GPUs without the need for significant upfront investment. Here are some of the top platforms:

  1. Modal

  2. Lambda

  3. Runpod

  4. Baseten

Here’s a comparison table of H100 GPU prices across these platforms:

Platform H100 Price (per hour)
Modal $4.56
Lambda Labs $2.99
Runpod $5.59
Baseten $9.984

Note: Prices are approximate and may vary based on region, availability, and specific configurations. Always check the official pricing pages for the most up-to-date information.

Pricing parameters

When considering the cost of using H100 GPUs on cloud platforms, it’s important to understand that the total price of a job depends on more than just the per-hour rate. Several factors contribute to the overall runtime and, consequently, the cost. This includes:

  1. Cold start time: This refers to the time it takes for a new instance of your application to start up and become ready to handle requests. In serverless environments, cold starts can occur when a new container or runtime environment needs to be initialized. For GPU workloads, this includes the time to allocate and initialize the GPU, load any necessary drivers or libraries, and set up the CUDA environment.

  2. Model loading time: This includes the time it takes to load your code, dependencies, and any large models into GPU memory. For large AI models, this can be significant. You should aim to do this as infrequently as possible: for example, load the model once and reuse it for multiple inferences, amortizing this cost over many requests.

  3. Inference speed: The speed of inference depends largely on the framework you use. For example, using optimized inference engines like NVIDIA TensorRT or vLLM can significantly speed up inference compared to standard PyTorch or TensorFlow implementations.

  4. Input/Output operations: If your job involves heavy I/O, such as downloading large datasets or models, or reading large files or writing extensive outputs, this can add to the overall runtime.

Depending on the platform you use, how much time each of these factors takes, and thus the amount of time you are billed for, can vary significantly.

Conclusion

While H100 GPUs offer unparalleled performance for AI and machine learning tasks, their high direct purchase cost can be prohibitive for many organizations. Serverless GPU platforms provide a more accessible and flexible alternative, allowing users to leverage the power of H100s without the hefty upfront investment.

Ready to experience the performance of H100 GPUs with the flexibility of serverless computing? Sign up for Modal today and start building your AI applications with ease!

Ship your first app in minutes.

Get Started

$30 / month free compute