How much does it cost to run NVIDIA B200 GPUs in 2025?

Researchers and engineers can now get their hands on the first NVIDIA Blackwell GPU, the B200. With 192 GB of ultra-fast HBM3e and a second-generation Transformer Engine that introduces FP4 arithmetic, a single card delivers up to 20 petaFLOPS of sparse-FP4 AI compute.

Blackwell B200 specs & performance

The B200 represents a massive leap in GPU capabilities: built on TSMC’s 4NP process, it packs 208 billion transistors across a dual-die design, enabling both the new FP4 Tensor Cores and an on-package NVSwitch.

Process: TSMC 4NP with 208 billion transistors (dual-die)
Memory: 192 GB HBM3e (cloud consoles expose 180 GB usable) — 2.4x H100 capacity
Bandwidth: 8 TB/s memory bandwidth, doubling Hopper’s throughput
Peak compute: 20 PFLOPS FP4 with 2:1 sparsity — ~5x H100 inference throughput
Interconnect: NVLink 5 at 1.8 TB/s bidirectional, removing PCIe bottlenecks

This huge transistor budget allows you to fit models that would require complex parallelism on H100s, while delivering inference throughput that makes real-time AI applications viable.

NVIDIA B200 cloud pricing

Here’s per-GPU pricing for B200s across major providers, from most to least flexible purchase options (July 2025):

Provider & SKU	Serverless	Spot	On‑demand	Capacity block	1‑yr reservation	3‑yr reservation	Pricing sources
Modal	$6.25/hr	n/a	n/a	n/a	n/a	n/a	Modal pricing
Baseten	$9.98/hr	n/a	n/a	n/a	n/a	n/a	Baseten pricing
RunPod	n/a	n/a	$5.99/hr	n/a	~$5.09 hr	n/a	Runpod pricing
Lambda Labs	n/a	n/a	$3.79/hr	n/a	$3.49/ hr	$2.99/hr	Lambda Labs pricing
AWS	n/a	n/a	$14.24/hr	$8.14/hr	~$12.50/hr	n/a	Vantage, AWS Saving Plan, AWS Capacity Block pricing
GCP	n/a	$8.06/hr	$18.53/hr	n/a	$11.12/hr	$7.09/hr	Vertex pricing, Google Cloud pricing, Spot pricing

Note that B200s are only available in instances of 8 GPUs on AWS and GCP

Choosing the right provider

Different scenarios call for different providers:

Scenario	Best fit	Rationale
Bursty AI inference traffic	Modal	Per-second billing and sub-second cold starts keep effective cost lowest
Static, predictable AI inference traffic	AWS, GCP	Most reliable option that offers reservation-based discounts
Multi-week training runs	Lambda Labs	Cheapest reservation prices

Serverless options like Modal automatically scale up and down from 0 so you only pay for compute you actually use. Reserved-capacity options provide a fixed block of resources; this is acceptable for static workloads but for variable workloads results in increased latency when demand is high and wasted money when demand is low.

On‑premise options: buy a B200 or DGX B200?

For those considering ownership:

Standalone B200 SXM module: $30,000 - $40,000 (one 700W GPU board)
Grace-Blackwell GB200 Superchip: $60,000 - $70,000 (1x Grace CPU + 2x B200)
NVIDIA DGX B200: ~$515,000 (8x B200, 1.44 TB GPU RAM, 72 PFLOPS FP8)

At $30k per card, breakeven against $6-8/hour cloud rates happens at ~60% utilization over 18 months (excluding electricity and cooling). Factor in datacenter space (~14 kW per DGX B200) and staff before buying.

B200 vs. H100/H200: is the upgrade worth it?

The B200 offers compelling advantages for specific workloads:

Memory headroom - 192 GB HBM3e lets you serve GPT-4-class 400B parameter models on one card instead of 2-way sharding
FP4 Transformer Engine - 5x higher inference throughput; MLPerf Llama-2-70B results show 2-3x tokens/second on identical node counts
Fifth-gen NVLink - 1.8 TB/s cuts all-reduce time ~40% in 8-GPU training replicas
Better real-world latency - Modal’s benchmarks show 2.5x lower TTFB versus H200 for MoE models

Quick‑start guide: run code on a cloud B200 in under 5 minutes

Modal’s serverless platform lets you run and deploy code on a B200 without having to manage cloud resources. To get started, simply sign up and run the code snippet below:

import modal

app = modal.App()

@app.function(gpu="B200")
def run_big_model():
    # This will run on a B200 on Modal

At $0.001736/second, you can benchmark for pennies, then scale to thousands of ephemeral workers without touching an instance planner.

Get started with B200s today

Modal serverless B200s at $6.25/hour is the most cost-effective option for bursty workloads.

If your H100s are out of memory or your user-visible latency targets are slipping, Blackwell’s 192 GB HBM3e and FP4 Tensor Cores are the most cost-effective escape hatch available today.