Last verified: April 13, 2026 6/7 prices verified

Groq API Pricing — April 2026

Groq's LPU hardware delivers the fastest inference speeds in the market. Best fit for latency-critical applications where speed outweighs model variety.

Model Pricing

Model Input ($/M tokens) Output ($/M tokens) Context Use Cases
Llama 3.3 70B Versatile $0.59 $0.79 128K General-purpose production workloads
Mixtral 8x7B $0.24 $0.24 32K General-purpose production workloads
Gemma 2 9B $0.20 $0.20 8K General-purpose production workloads
Llama 3.1 8B Instant $0.05 $0.08 128K General-purpose production workloads
Llama 3.2 1B Preview $0.04 $0.04 128K General-purpose production workloads
Llama 3.2 3B Preview $0.06 $0.06 128K General-purpose production workloads
DeepSeek R1 Distill Llama 70B unverified $0.75 $0.99 128K Complex reasoning, math, code

How Groq compares to other providers

Running Groq in production? Clawback audits your spend and shows you exactly where you can cut costs without sacrificing quality.

Audit your Groq spend →