Fireworks.ai API Pricing — April 2026
Fireworks.ai focuses on low-latency open-model hosting, offering fast inference for Llama and other open-weight models at competitive token prices.
Model Pricing
| Model | Input ($/M tokens) | Output ($/M tokens) | Context | Use Cases |
|---|---|---|---|---|
| Llama 3.3 70B Instruct | $0.90 | $0.90 | 128K | General-purpose production workloads |
| DeepSeek R1 unverified | $3.00 | $8.00 | 128K | Complex reasoning, math, code |
| Qwen 2.5 72B Instruct unverified | $0.90 | $0.90 | 32K | General-purpose production workloads |
| Llama 3.1 8B Instruct | $0.10 | $0.10 | 128K | General-purpose production workloads |
| Mixtral 8x22B Instruct unverified | $0.90 | $0.90 | 65K | General-purpose production workloads |
How Fireworks.ai compares to other providers
OpenAI PricingAnthropic PricingGoogle PricingMistral PricingMeta (via Together.ai) PricingCohere PricingxAI PricingDeepSeek PricingTogether.ai PricingGroq PricingAmazon Bedrock PricingCerebras PricingNVIDIA NIM Pricing
Running Fireworks.ai in production? Clawback audits your spend and shows you exactly where you can cut costs without sacrificing quality.
Audit your Fireworks.ai spend →