You're evaluating four AI APIs. You open four pricing pages. One charges by model version. One charges differently based on context window size. One bundles usage into a seat fee. None of them use the same units. You spend an afternoon building a spreadsheet. By next quarter, half the prices will have changed.
This guide does that work for you. All major AI API providers, same format, current as of February 2026.
How to Read AI API Pricing
Most AI APIs charge per token. A token is roughly 0.75 words in English — a 1,000-word document is approximately 1,300 tokens. Pricing is quoted per 1 million tokens, split into input (the prompt you send) and output (the response the model generates).
Why input and output are priced differently: Output tokens cost more to generate than input tokens are to process. For most chat-style tasks, output tokens represent 20–40% of total token volume but a higher share of cost. For long-context retrieval tasks (RAG), input tokens dominate.
Why context window size matters for cost: Some providers charge more for larger context windows — even at the same model tier. Sending a 200,000-token document to a model costs significantly more than sending a 2,000-token prompt, both in tokens consumed and sometimes in per-token rate.
OpenAI Pricing (February 2026)
OpenAI prices dropped significantly through 2025 and continue to fall with new model releases.
GPT-4o — $2.50 per 1M input tokens, $10.00 per 1M output tokens. The default capable model for complex tasks, instruction-following, and multimodal inputs (images, audio). Cached input tokens: $1.25/1M.
GPT-4o-mini — $0.15 per 1M input tokens, $0.60 per 1M output tokens. The right-sized model for classification, extraction, summarisation, and simple Q&A. 17x cheaper than GPT-4o on input tokens.
o1 — $15.00 per 1M input tokens, $60.00 per 1M output tokens. OpenAI's reasoning model for hard logic, mathematics, and multi-step planning. Only use when GPT-4o cannot handle the task reliably — the cost difference is significant.
o3-mini — $1.10 per 1M input tokens, $4.40 per 1M output tokens. Reasoning capability at a fraction of o1's price. A practical replacement for o1 on most reasoning tasks.
Embeddings — text-embedding-3-small: $0.02/1M tokens. text-embedding-3-large: $0.13/1M tokens.
Tier pricing (1–5) affects rate limits, not per-token cost. Higher spend unlocks higher rate limits automatically.
Anthropic Pricing (February 2026)
Claude 3.5 Sonnet — $3.00 per 1M input tokens, $15.00 per 1M output tokens. Anthropic's flagship model for instruction-following, long-document analysis, and coding. Output costs are high — on long-form tasks, Claude 3.5 Sonnet can be more expensive than GPT-4o when generating lengthy responses.
Claude 3.5 Haiku — $0.80 per 1M input tokens, $4.00 per 1M output tokens. Fast, cost-effective for tasks requiring good instruction-following at lower complexity. 3.75x cheaper than Sonnet on input, 3.75x cheaper on output.
Claude 3 Opus — $15.00 per 1M input tokens, $75.00 per 1M output tokens. Anthropic's most capable model, now largely superseded by Claude 3.5 Sonnet for most tasks. High cost is rarely justified unless you're specifically evaluating Opus-level capability.
Context windows: All Claude models support up to 200,000 tokens of context. At long context lengths, input token costs compound quickly — a 100,000-token prompt through Claude 3.5 Sonnet costs $0.30 in input tokens alone.
Google Gemini Pricing (February 2026)
Gemini 1.5 Pro — $1.25 per 1M input tokens (prompts up to 128k tokens), $5.00 per 1M output tokens. Jumps to $2.50/$10.00 per 1M for prompts over 128k tokens. Strong for long-context tasks and multimodal inputs.
Gemini 1.5 Flash — $0.075 per 1M input tokens (up to 128k), $0.30 per 1M output tokens. One of the cheapest capable models available. Well-suited for high-volume structured tasks.
Gemini 2.0 Flash — $0.10 per 1M input tokens, $0.40 per 1M output tokens. Successor to 1.5 Flash with improved instruction-following. The best value option in Google's lineup for most tasks.
Gemini 2.0 Flash-Lite — $0.075 per 1M input tokens, $0.30 per 1M output tokens. Lowest-cost Gemini model for lightweight inference.
Free tier: Gemini models are available free via Google AI Studio with rate limits. Useful for prototyping and evaluation before committing to production usage.
xAI Grok Pricing (February 2026)
Grok-2 — $2.00 per 1M input tokens, $10.00 per 1M output tokens. Competitive with GPT-4o on pricing; positioned as an alternative for general-purpose tasks.
Grok-2-mini — $0.20 per 1M input tokens, $0.40 per 1M output tokens. Lower-cost variant for lighter tasks. Notably, output tokens are priced the same as input tokens — unusual among API providers.
Context window: 131,072 tokens across all Grok-2 models.
Grok's API is relatively new. It is not yet as widely integrated as OpenAI or Anthropic, but pricing is competitive for teams wanting an alternative to the main two providers.
Mistral Pricing (February 2026)
Mistral Large — $2.00 per 1M input tokens, $6.00 per 1M output tokens. Mistral's flagship model for complex instruction-following and multilingual tasks. Strong European compliance positioning (data residency in EU available).
Mistral Small — $0.20 per 1M input tokens, $0.60 per 1M output tokens. Compact, capable model for structured tasks. 10x cheaper than Mistral Large.
Mistral Nemo — $0.15 per 1M input tokens, $0.15 per 1M output tokens. Open-weight model available via API at very low cost. Output token pricing matches input — useful for tasks generating long responses.
Self-hosting: Mistral models (including Mistral 7B, Mixtral 8x7B, Mixtral 8x22B) are open-weight and can be self-hosted via Hugging Face or direct deployment, potentially reducing per-token costs at sufficient scale.
Cursor Pricing (February 2026)
Cursor is not token-based in the traditional sense. You pay for seats and receive a monthly allocation of "fast requests" (powered by Claude and GPT-4o under the hood). After the allocation, the tool switches to slower models.
Hobby — Free. 2,000 completions, 50 slow premium requests per month.
Pro — $20/month (individual). 500 fast premium requests/month, then slow model. Unlimited slow requests. No team management or usage analytics.
Business — $40/seat/month. Unlimited fast requests (fair use), team admin dashboard, usage visibility per developer, SSO, privacy mode.
The practical implication: Pro is fine for light users. Heavy developers hit the 500 fast request limit mid-month. Business is the only tier where you can see per-developer consumption — relevant if you're trying to understand or budget AI coding costs at a team level.
Hugging Face Pricing (February 2026)
Hugging Face offers two main ways to access models via API.
Serverless Inference API — pay per token for supported models. Pricing varies by model. Access to open-weight models (Llama 3.1, Mistral, Falcon, etc.) and specialised models (Whisper, CLIP, sentence transformers). Rates are typically lower than closed-source equivalents.
Inference Endpoints — dedicated compute instances you deploy a model onto. From $0.032/hour (2 vCPU CPU-only) to $2.40/hour (NVIDIA A10G GPU) for common tiers. Pricing is compute-time based, not per-token. Cost-effective at high throughput once you need dedicated capacity.
Free tier: The shared Inference API is free with rate limits. Suitable for evaluation and low-volume production use on smaller models.
Master Comparison Table
| Model | Input ($/1M) | Output ($/1M) | Context Window | Free Tier |
|---|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128k | No |
| GPT-4o-mini | $0.15 | $0.60 | 128k | No |
| o1 | $15.00 | $60.00 | 200k | No |
| o3-mini | $1.10 | $4.40 | 200k | No |
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200k | No |
| Claude 3.5 Haiku | $0.80 | $4.00 | 200k | No |
| Claude 3 Opus | $15.00 | $75.00 | 200k | No |
| Gemini 1.5 Pro | $1.25 | $5.00 | 1M+ | Yes (AI Studio) |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M+ | Yes (AI Studio) |
| Grok-2 | $2.00 | $10.00 | 131k | No |
| Grok-2-mini | $0.20 | $0.40 | 131k | No |
| Mistral Large | $2.00 | $6.00 | 128k | No |
| Mistral Small | $0.20 | $0.60 | 128k | No |
| Mistral Nemo | $0.15 | $0.15 | 128k | No |
Prices per 1 million tokens. Input and output billed separately. Correct as of February 2026 — check provider pricing pages for current rates.
What This Costs in Practice
A typical product team using AI across a few different tasks:
- GPT-4o for a chat interface — 500,000 input tokens/day, 150,000 output tokens/day → ~$1.50/day input, ~$1.50/day output → ~$90/month
- GPT-4o-mini for classification at scale — 5M input tokens/day, 500k output tokens/day → ~$0.75/day input, ~$0.30/day output → ~$32/month
- Cursor Business for 10 developers → $400/month
- Hugging Face Serverless for embeddings — 20M tokens/day → ~$0.40/day → ~$12/month
Combined estimate: ~$534/month. Each provider bills separately. None of them show you the combined total unless you add them up manually.
Tracking AI API Spend Across Providers
The challenge with multi-provider AI usage isn't any single bill — it's the aggregate. When OpenAI, Anthropic, Cursor, and Hugging Face all invoice separately, on different cycles, with different units, the combined picture doesn't exist unless you build it.
Connect all your AI providers to StackSpend for a single view of total AI API spend, daily anomaly detection (with webhooks to push alerts to your systems), and pace-to-forecast alerts. Setup guides: OpenAI, Anthropic, Cursor, Hugging Face, GCP (Gemini via Vertex).