0/3 prices verified

NVIDIA NIM API Pricing — April 2026

NVIDIA NIM provides optimized inference for major open-weight models running on NVIDIA infrastructure, with consistent latency and enterprise SLA options.

Model Pricing

Model	Input ($/M tokens)	Output ($/M tokens)	Context	Use Cases
Llama 3.1 70B unverified	$0.35	$0.40	128K	General-purpose production workloads
Llama 3.1 8B unverified	$0.05	$0.05	128K	General-purpose production workloads
Mistral 7B Instruct unverified	$0.04	$0.04	32K	General-purpose production workloads

How NVIDIA NIM compares to other providers

OpenAI Pricing Anthropic Pricing Google Pricing Mistral Pricing Meta (via Together.ai) Pricing Cohere Pricing xAI Pricing DeepSeek Pricing Together.ai Pricing Fireworks.ai Pricing Groq Pricing Amazon Bedrock Pricing Cerebras Pricing

Running NVIDIA NIM in production? Clawback audits your spend and shows you exactly where you can cut costs without sacrificing quality.

Audit your NVIDIA NIM spend →