Last verified: April 13, 2026 6/7 prices verified

Groq API Pricing — April 2026

Groq's LPU hardware delivers the fastest inference speeds in the market. Best fit for latency-critical applications where speed outweighs model variety.

Model Pricing

Model	Input ($/M tokens)	Output ($/M tokens)	Context	Use Cases
Llama 3.3 70B Versatile	$0.59	$0.79	128K	General-purpose production workloads
Mixtral 8x7B	$0.24	$0.24	32K	General-purpose production workloads
Gemma 2 9B	$0.20	$0.20	8K	General-purpose production workloads
Llama 3.1 8B Instant	$0.05	$0.08	128K	General-purpose production workloads
Llama 3.2 1B Preview	$0.04	$0.04	128K	General-purpose production workloads
Llama 3.2 3B Preview	$0.06	$0.06	128K	General-purpose production workloads
DeepSeek R1 Distill Llama 70B unverified	$0.75	$0.99	128K	Complex reasoning, math, code

How Groq compares to other providers

OpenAI Pricing Anthropic Pricing Google Pricing Mistral Pricing Meta (via Together.ai) Pricing Cohere Pricing xAI Pricing DeepSeek Pricing Together.ai Pricing Fireworks.ai Pricing Amazon Bedrock Pricing Cerebras Pricing NVIDIA NIM Pricing

Running Groq in production? Clawback audits your spend and shows you exactly where you can cut costs without sacrificing quality.

Audit your Groq spend →