Groq API Pricing — April 2026
Groq's LPU hardware delivers the fastest inference speeds in the market. Best fit for latency-critical applications where speed outweighs model variety.
Model Pricing
| Model | Input ($/M tokens) | Output ($/M tokens) | Context | Use Cases |
|---|---|---|---|---|
| Llama 3.3 70B Versatile | $0.59 | $0.79 | 128K | General-purpose production workloads |
| Mixtral 8x7B | $0.24 | $0.24 | 32K | General-purpose production workloads |
| Gemma 2 9B | $0.20 | $0.20 | 8K | General-purpose production workloads |
| Llama 3.1 8B Instant | $0.05 | $0.08 | 128K | General-purpose production workloads |
| Llama 3.2 1B Preview | $0.04 | $0.04 | 128K | General-purpose production workloads |
| Llama 3.2 3B Preview | $0.06 | $0.06 | 128K | General-purpose production workloads |
| DeepSeek R1 Distill Llama 70B unverified | $0.75 | $0.99 | 128K | Complex reasoning, math, code |
How Groq compares to other providers
OpenAI PricingAnthropic PricingGoogle PricingMistral PricingMeta (via Together.ai) PricingCohere PricingxAI PricingDeepSeek PricingTogether.ai PricingFireworks.ai PricingAmazon Bedrock PricingCerebras PricingNVIDIA NIM Pricing
Running Groq in production? Clawback audits your spend and shows you exactly where you can cut costs without sacrificing quality.
Audit your Groq spend →