NVIDIA NIM API Pricing — April 2026
NVIDIA NIM provides optimized inference for major open-weight models running on NVIDIA infrastructure, with consistent latency and enterprise SLA options.
Model Pricing
| Model | Input ($/M tokens) | Output ($/M tokens) | Context | Use Cases |
|---|---|---|---|---|
| Llama 3.1 70B unverified | $0.35 | $0.40 | 128K | General-purpose production workloads |
| Llama 3.1 8B unverified | $0.05 | $0.05 | 128K | General-purpose production workloads |
| Mistral 7B Instruct unverified | $0.04 | $0.04 | 32K | General-purpose production workloads |
How NVIDIA NIM compares to other providers
OpenAI PricingAnthropic PricingGoogle PricingMistral PricingMeta (via Together.ai) PricingCohere PricingxAI PricingDeepSeek PricingTogether.ai PricingFireworks.ai PricingGroq PricingAmazon Bedrock PricingCerebras Pricing
Running NVIDIA NIM in production? Clawback audits your spend and shows you exactly where you can cut costs without sacrificing quality.
Audit your NVIDIA NIM spend →