Cerebras API Pricing — April 2026
Cerebras delivers the fastest token generation in the industry using wafer-scale silicon. Best fit for real-time applications where output speed is the primary constraint.
Model Pricing
| Model | Input ($/M tokens) | Output ($/M tokens) | Context | Use Cases |
|---|---|---|---|---|
| Llama 3.3 70B unverified | $0.85 | $1.20 | 8K | General-purpose production workloads |
| Llama 3.1 8B unverified | $0.10 | $0.10 | 8K | General-purpose production workloads |
How Cerebras compares to other providers
OpenAI PricingAnthropic PricingGoogle PricingMistral PricingMeta (via Together.ai) PricingCohere PricingxAI PricingDeepSeek PricingTogether.ai PricingFireworks.ai PricingGroq PricingAmazon Bedrock PricingNVIDIA NIM Pricing
Running Cerebras in production? Clawback audits your spend and shows you exactly where you can cut costs without sacrificing quality.
Audit your Cerebras spend →