How much does it cost to run NVIDIA A10G GPUs in 2025?

The NVIDIA A10G is a versatile GPU that bridges the gap between graphics workloads and AI inference. With 24 GB of GDDR6 memory and Ampere architecture, it delivers solid performance for mid-range AI models and visualization tasks at a fraction of the cost of flagship GPUs.

A10G specs & performance

The NVIDIA A10G strikes a balance between graphics capabilities and AI compute, making it a jack-of-all-trades in the GPU landscape.

Architecture: Ampere GA102 with 9,216 CUDA cores
Memory: 24 GB GDDR6 with ECC
Bandwidth: 600 GB/s memory bandwidth
Peak compute: 31.2 TFLOPS FP32, 125 TFLOPS tensor performance
Special features: 2nd-gen RT Cores for ray tracing, 3rd-gen Tensor Cores for AI

This configuration makes it ideal for serving small to medium language models (6B-13B parameters), running diffusion models like Stable Diffusion, and running GPU-accelerated visualization tasks including CAD and 3D rendering.

NVIDIA A10(G) cloud pricing

Here’s per-GPU pricing for A10G across major providers (August 2025):

Provider & SKU	Serverless	Spot	On-demand	1-yr reservation	3-yr reservation	Pricing source
Modal	$1.10/hr or $0.000306/sec	n/a	n/a	n/a	n/a	Modal pricing
AWS (G5.xlarge, 1× A10G)	n/a	$0.43/hr	$1.10/hr	~$0.70/hr	~$0.48/hr	Vantage
Azure (NVads A10 v5)	n/a	$0.60/hr	$3.20/hr	n/a	n/a	Vantage
Lambda Labs	n/a	n/a	$0.75/hr	Contact sales	Contact sales	Lambda
Oracle Cloud Infrastructure	n/a	n/a	$2.00/hr	n/a	n/a	OCI

You may notice that some providers offer A10s while others offer A10Gs. A10Gs are a variant of A10s created specifically for AWS. Both share the same VRAM and memory bandwidth, but A10s have a slightly higher CUDA core count and TFLOPS performance. Practically speaking, they perform similarly for ML inference tasks.

Choosing the right provider

There’s less choice for A10G GPUs than for many other GPU types—Google Cloud doesn’t offer them at all, and Azure’s pricing makes them hard to justify at $3.20/hr on-demand.

For inference workloads, Modal’s serverless offering at $0.000306/second means you only pay for actual compute time. A model that handles 100 requests taking 2 seconds each costs just $0.06—compared to paying $1.10 for the full hour on AWS whether you use it or not.
Lambda Labs offers the most straightforward deal at $0.75/hr for steady workloads that need uninterrupted compute.
AWS with 3-year reservations is ~$0.48/hr, but that locks you into paying $4,200 annually per GPU regardless of actual usage.

The math is simple: unless your GPU runs consistently for three years, serverless beats reserved pricing. And that’s before considering that serverless handles traffic spikes automatically while reserved instances leave you choosing between wasted capacity or unhappy users during peak traffic.

On-premise options: buy an A10G?

For those considering ownership:

Single A10G PCIe card: ~$2,000-3,000

At $2.5k per card, breakeven against $1-2/hour cloud rates happens in 4-8 months of heavy utilization. Factor in ~$0.15-0.25/hr for electricity and cooling, and your effective cost might be $0.35-0.50/hr—but only if you maintain >70% utilization.

When is an A10G the optimal choice?

With an A10G GPU, you get up to 3.3x better ML training performance, 3x better ML inference performance, and 3x better graphics performance, in comparison to NVIDIA T4 GPUs. The A10G hits the sweet spot for several specific scenarios:

Mid-sized model inference (7B parameters): The A10G’s 24 GB of memory perfectly fits models like Llama-7B, Mistral-7B, or Whisper. These workloads are memory-bound rather than compute-bound, making the A10G’s balanced specs ideal as you’re not overpaying for unused compute like with an A100.
Mixed AI + graphics workloads: Unlike the A100, the A10G includes RT cores for ray tracing, making it uniquely suited for pipelines that combine ML inference with 3D rendering, video processing, or CAD visualization. If you’re building applications that need both AI and graphics acceleration, the A10G eliminates the need for separate GPU types.
Migration from V100s: Organizations still running V100s (16-32 GB) can modernize with A10Gs for better efficiency, newer Ampere architecture, and more consistent 24 GB memory—all while gaining ray tracing capabilities the V100 lacks.

The A10G isn’t trying to compete with A100s on raw performance or H100s on cutting-edge capabilities. Instead, it owns the middle ground where most real-world inference happens—serving models that are too large for T4s but don’t need flagship GPU power.

Quick-start guide: run code on a cloud A10G in under 5 minutes

Modal’s serverless platform lets you run code on an A10G without managing infrastructure:

import modal

app = modal.App()

@app.function(gpu="A10G")
def run_inference():
    # This will run on an A10G on Modal

At $0.000306/second, you can prototype for pennies, then scale to production without touching instance configuration or worrying about idle costs.

Get started with A10G today

The A10G represents excellent value for AI inference and graphics workloads, offering enough memory for most models at accessible prices. Whether you’re serving diffusion models, running smaller LLMs, or building GPU-accelerated applications, the A10G delivers solid performance without breaking the budget—especially on serverless platforms where you only pay for actual usage.