AI API Cost Calculator

Compare real token pricing for Claude, GPT-5, Gemini, Groq, DeepSeek, and more. Estimate your monthly API spend before you build - no surprises on your invoice.

Prices verified June 2026

1. Select a provider

2. Paste your prompt to count tokens (optional)

Token count appears here after you paste text above.

3. Estimate your usage

Cost per call
—
USD per API call
Monthly cost
—
USD per month
1K calls/mo
—
10K calls/mo
—
100K calls/mo
—

How this compares across all models — same usage

Model Per call Monthly

What Are Tokens and Why Do They Cost Money?

When you send a message to an AI model through its API, the text is broken into small units called tokens before processing. A token is roughly 4 characters or about 0.75 words in English, so a 1,000-word document is approximately 1,300 tokens.

AI providers charge separately for input tokens, which are your prompt, system instructions, and conversation history, and output tokens, which are the model’s response. Output tokens are usually more expensive because they must be generated sequentially.

This is why API costs can surprise developers. A chatbot with a long system prompt, full conversation history, and verbose responses can cost far more than expected. The calculator above helps you estimate real costs before you commit to building.

How to Reduce Your AI API Costs

Prompt compression is the highest-leverage optimization available. Every token you remove from your system prompt multiplies across every API call you make.

Choose the right model tier for each task. Not every API call needs a flagship model. Claude Haiku 4.5, GPT-5.4 nano, Gemini Flash-Lite, and low-cost Groq models can handle simple classification, extraction, summarization, and support responses at much lower cost.

Implement prompt caching for repeated system prompts, and set max tokens on every API call. Uncapped output tokens are one of the most common causes of runaway API bills.

Current AI API Pricing - June 2026

Model Input /1M tokens Output /1M tokens Best for
GPT OSS 20B 128K fast$0.075$0.30Real-time chat, routing, simple high-volume tasks
GPT OSS Safeguard 20B$0.075$0.30Moderation and safeguard layers
Gemini 2.5 Flash-Lite budget$0.10$0.40Bulk processing and classification
DeepSeek V4 Flash budget$0.14$0.28Low-cost long-context reasoning and coding
GPT OSS 120B 128K$0.15$0.60Higher-quality Groq text generation
Gemini 3.1 Flash-Lite$0.15$0.50Low-cost Google long-context tasks
GPT-5.4 nano$0.20$1.25Cheapest OpenAI model family option
Qwen3 32B 131K$0.29$0.59Fast multilingual and reasoning work on Groq
Gemini 2.5 Flash$0.30$2.50Fast multimodal, 1M context
DeepSeek V4 Pro$0.435$0.87Higher-capability DeepSeek workloads
GPT-5.4 mini$0.75$4.50Balanced OpenAI workloads
Claude Haiku 4.5 Claude budget$1.00$5.00High-volume Claude tasks
Gemini 3.1 Pro$2.00$12.00Google Pro long-context and multimodal work
GPT-5.4 OpenAI flagship$2.50$15.00Complex reasoning, tools, and vision
Claude Sonnet 4.6$3.00$15.00Coding, agents, and production assistants
Claude Opus 4.8 Claude premium$5.00$25.00Hard reasoning and deep research
GPT-5.5$5.00$30.00Premium OpenAI reasoning and agents
GPT-5.5 pro premium$30.00$180.00Specialist high-value workloads

Frequently Asked Questions

Claude Haiku 4.5 is listed at $1.00 per million input tokens and $5.00 per million output tokens. Claude Sonnet 4.5 and 4.6 are listed at $3.00 input and $15.00 output. Claude Opus 4.5 through 4.8 are listed at $5.00 input and $25.00 output.
The OpenAI models listed here range from GPT-5.4 nano at $0.20 input and $1.25 output per million tokens to GPT-5.5 pro at $30.00 input and $180.00 output per million tokens.
At the budget tier, GPT-5.4 nano is cheaper than Claude Haiku. At the premium tier, Claude Opus has lower output pricing than GPT-5.5, while GPT-5.4 has lower input pricing than Claude Opus. The best choice depends on your input/output mix.
For simple production tasks, low-cost options include GPT OSS 20B 128K on Groq, Gemini 2.5 Flash-Lite, DeepSeek V4 Flash, and Gemini 3.1 Flash-Lite. For proprietary ecosystem quality, GPT-5.4 nano and Claude Haiku 4.5 are good starting points.
Estimate your average input tokens per call, output tokens per call, and number of calls per month. Multiply each token count by the matching model rate per million tokens, add input and output cost per call, then multiply by monthly call volume.
Output generation is sequential. The model generates response tokens one at a time, which is more compute-intensive than processing input tokens. That is why output tokens often cost several times more than input tokens.
Prompt caching stores repeated prompt content, such as long system prompts or documents, so repeated requests can reuse cached context. This can substantially reduce repeated input costs on high-volume apps.
Scroll to Top