Compare real token pricing for Claude, GPT-5, Gemini, Groq, DeepSeek, and more. Estimate your monthly API spend before you build - no surprises on your invoice.
Prices verified June 2026| Model | Per call | Monthly |
|---|
When you send a message to an AI model through its API, the text is broken into small units called tokens before processing. A token is roughly 4 characters or about 0.75 words in English, so a 1,000-word document is approximately 1,300 tokens.
AI providers charge separately for input tokens, which are your prompt, system instructions, and conversation history, and output tokens, which are the model’s response. Output tokens are usually more expensive because they must be generated sequentially.
This is why API costs can surprise developers. A chatbot with a long system prompt, full conversation history, and verbose responses can cost far more than expected. The calculator above helps you estimate real costs before you commit to building.
Prompt compression is the highest-leverage optimization available. Every token you remove from your system prompt multiplies across every API call you make.
Choose the right model tier for each task. Not every API call needs a flagship model. Claude Haiku 4.5, GPT-5.4 nano, Gemini Flash-Lite, and low-cost Groq models can handle simple classification, extraction, summarization, and support responses at much lower cost.
Implement prompt caching for repeated system prompts, and set max tokens on every API call. Uncapped output tokens are one of the most common causes of runaway API bills.
| Model | Input /1M tokens | Output /1M tokens | Best for |
|---|---|---|---|
| GPT OSS 20B 128K fast | $0.075 | $0.30 | Real-time chat, routing, simple high-volume tasks |
| GPT OSS Safeguard 20B | $0.075 | $0.30 | Moderation and safeguard layers |
| Gemini 2.5 Flash-Lite budget | $0.10 | $0.40 | Bulk processing and classification |
| DeepSeek V4 Flash budget | $0.14 | $0.28 | Low-cost long-context reasoning and coding |
| GPT OSS 120B 128K | $0.15 | $0.60 | Higher-quality Groq text generation |
| Gemini 3.1 Flash-Lite | $0.15 | $0.50 | Low-cost Google long-context tasks |
| GPT-5.4 nano | $0.20 | $1.25 | Cheapest OpenAI model family option |
| Qwen3 32B 131K | $0.29 | $0.59 | Fast multilingual and reasoning work on Groq |
| Gemini 2.5 Flash | $0.30 | $2.50 | Fast multimodal, 1M context |
| DeepSeek V4 Pro | $0.435 | $0.87 | Higher-capability DeepSeek workloads |
| GPT-5.4 mini | $0.75 | $4.50 | Balanced OpenAI workloads |
| Claude Haiku 4.5 Claude budget | $1.00 | $5.00 | High-volume Claude tasks |
| Gemini 3.1 Pro | $2.00 | $12.00 | Google Pro long-context and multimodal work |
| GPT-5.4 OpenAI flagship | $2.50 | $15.00 | Complex reasoning, tools, and vision |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Coding, agents, and production assistants |
| Claude Opus 4.8 Claude premium | $5.00 | $25.00 | Hard reasoning and deep research |
| GPT-5.5 | $5.00 | $30.00 | Premium OpenAI reasoning and agents |
| GPT-5.5 pro premium | $30.00 | $180.00 | Specialist high-value workloads |