AI API Pricing Guide (July 2026)

Q: What is the current xAI Grok API pricing in July 2026?

In this July 2026 snapshot, grok-4.3 (the current flagship) is listed at $1.25 per 1M input tokens and $2.50 per 1M output tokens with a 1M-token context, while grok-4.1-fast is listed at $0.20 input and $0.50 output per 1M tokens. xAI tool calls are billed separately from token usage.

Q: What is the current OpenAI API pricing in July 2026?

In this guide's July 2026 snapshot, GPT-5.5 (the current GA flagship) is listed at $5.00 input and $30.00 output per 1M tokens (cached input $0.50), and GPT-5 Mini is listed at $0.25 input and $2.00 output. GPT-5.6 (Sol/Terra/Luna) is in limited preview. Batch API and cached input pricing can materially reduce effective cost.

Q: What is the current Anthropic API pricing in July 2026?

This snapshot lists Claude Opus 4.8 at $5.00 input / $25.00 output, Claude Sonnet 4.6 at $3.00 / $15.00, and Claude Haiku 4.5 at $1.00 / $5.00 per 1M tokens, with long-context premiums above 200K input tokens on supported tiers.

Q: Where is the official OpenAI API pricing page?

Use the official OpenAI pricing pages directly: - OpenAI Pricing - OpenAI Pricing Docs

Q: Where is the official xAI Grok API pricing page?

Use the official xAI models/pricing documentation: - xAI Models and Pricing

Use this when you are evaluating AI API providers and need current pricing in a consistent format — or when you need to translate pricing into a rough monthly cost estimate for a specific workload.

The fast answer: most AI APIs charge per million tokens, split into input (the prompt you send) and output (the response you receive). Output tokens are priced higher than input tokens. The cheapest option for your workload depends on your prompt shape, not just the headline rate. A mid-tier model with short prompts can cost less than a cheap model with long prompts. This guide gives you the current numbers and a section on how to calculate what pricing actually means for your specific usage.

You're evaluating four AI APIs. You open four pricing pages. One charges by model family. One jumps to higher rates past 200K context. One bundles usage into seats and credits. None of them use the same units.

This guide does that work for you. All major AI API providers, same format, updated for July 2026. If you want to compare what these choices mean after deployment, see AI cost monitoring.

What changed in this update

Refreshed all rates to the current July 2026 lineup: OpenAI GPT-5.5, Anthropic Opus 4.8, Google Gemini 3.5 Flash, and xAI Grok 4.3.
Re-verified provider list prices against official pricing pages for this snapshot.
Refreshed review metadata and source-check date for July 2026.

How to Read AI API Pricing

Most AI APIs charge per token. A token is roughly 0.75 words in English — a 1,000-word document is approximately 1,300 tokens. Pricing is quoted per 1 million tokens, split into input (the prompt you send) and output (the response the model generates).

Why input and output are priced differently: Output tokens cost more to generate than input tokens are to process. For most chat-style tasks, output tokens represent 20–40% of total token volume but a higher share of cost. For long-context retrieval tasks (RAG), input tokens dominate.

Why context window size matters for cost: Some providers charge more for larger context windows — even at the same model tier. Sending a 200,000-token document to a model costs significantly more than sending a 2,000-token prompt, both in tokens consumed and sometimes in per-token rate. If you also need one view across infrastructure and model spend, see cloud + AI cost monitoring.

If you already know your workload shape, skip next to the more specific follow-on guides: cheapest AI API for chat, RAG, and coding, OpenAI vs Anthropic pricing, or long-context pricing above 200K tokens.

How to use this guide to make a decision

The pricing tables below tell you the list rate. To turn that into a cost estimate for your workload, you need three numbers from your own product:

Average input tokens per request — how long is your prompt including system prompt, any retrieved context, and user input?
Average output tokens per request — how long is the model's typical response?
Requests per month — how many calls will this workflow make at your expected usage level?

Then apply:

Monthly cost = (input_tokens/1,000,000 × input_rate)
             + (output_tokens/1,000,000 × output_rate)
             × requests_per_month

A worked example: A team is building a content moderation classifier. Each request has a 400-token input (content + system prompt) and a 50-token output (a classification label + brief reasoning). They expect 2 million requests per month.

Comparing two candidates:

Model	Input ($/1M)	Output ($/1M)	Monthly cost
GPT-5 Mini	$0.25	$2.00	(0.4 × $0.25 + 0.05 × $2.00) × 2,000 = $400
Claude Haiku 4.5	$1.00	$5.00	(0.4 × $1.00 + 0.05 × $5.00) × 2,000 = $1,300

At this workload shape, GPT-5 Mini is 3.25x cheaper than Claude Haiku for the same task. If quality is comparable, that difference compounds quickly at scale.

The important insight is that "cheapest model" and "cheapest for this workload" are not the same question. A model with slightly higher input rates but much lower output rates can be cheaper for long-response workflows. Check both sides of the rate before deciding.

OpenAI API Pricing (July 2026)

OpenAI's default lineup runs on GPT-5.5 for general text workloads, with GPT-5.6 (Sol/Terra/Luna) in limited preview and expected to reach GA soon.

GPT-5.5 — $5.00 per 1M input tokens, $30.00 per 1M output tokens. Cached input: $0.50/1M. 1M-token context.

GPT-5.5 Pro — $30.00 per 1M input tokens, $180.00 per 1M output tokens. Premium tier for highest-complexity tasks.

GPT-5 Mini — $0.25 per 1M input tokens, $2.00 per 1M output tokens. Cached input: $0.025/1M. Still the cheap tier.

Cost controls worth using: Batch API gives 50% off input and output for async workloads, and cached prompt pricing materially changes costs for repeated long-context prompts. If OpenAI is a meaningful part of your bill, OpenAI cost monitoring helps you track those changes over time instead of only checking pricing tables.

Anthropic API Pricing (July 2026)

Anthropic's current family centers on Claude 4.x model tiers, with Fable 5 sitting above Opus as a new "Mythos-class" tier.

Claude Opus 4.8 — $5.00 per 1M input tokens, $25.00 per 1M output tokens. 1M-token context; current premium tier.

Claude Sonnet 4.6 — $3.00 per 1M input tokens, $15.00 per 1M output tokens. The current daily-driver default.

Claude Haiku 4.5 — $1.00 per 1M input tokens, $5.00 per 1M output tokens. Fast, lower-cost tier.

Important pricing behavior: for supported 1M-context models, requests over 200K input tokens move to higher long-context rates. Batch API pricing remains 50% lower than standard list rates.

If Claude is one of your top two candidates, OpenAI vs Anthropic pricing in 2026 goes deeper on when the higher list price is still the better value.

Google Gemini API Pricing (July 2026)

Google's current lineup leads with Gemini 3.5 Flash (GA), while the flagship Gemini 3.1 Pro handles higher-end reasoning. Gemini 3.5 Pro is still rolling out and Google has not published stable API rates for it yet.

Gemini 3.5 Flash — $1.50 per 1M input tokens, $9.00 per 1M output tokens. Cached input: $0.15/1M. Up to 1M-token context.

Gemini 3.1 Pro — $2.00 per 1M input tokens, $12.00 per 1M output tokens for prompts up to 200K tokens; prompts above 200K move to higher long-context rates.

Gemini 2.5 Flash Lite — $0.10 per 1M input tokens, $0.40 per 1M output tokens. Still the lowest-cost Gemini text tier.

Context threshold still matters: several Gemini SKUs increase pricing beyond 200K input tokens, so long-context prompting can raise costs quickly.

For products built around large documents or retrieval-heavy prompts, read Long-Context AI Pricing in 2026 before you commit to a model.

AWS Bedrock and Amazon Nova API Pricing (July 2026)

AWS Bedrock is different from single-vendor model APIs: it is a model platform with multiple providers and service tiers.

For Amazon's first-party family, Amazon Nova is the key lineup to watch:

Nova Micro (text-focused, low-latency/cost tier)
Nova Lite (low-cost multimodal tier)
Nova Pro (higher-capability multimodal tier)
Nova Premier (most capable tier for complex reasoning/distillation workflows)

How pricing works on Bedrock/Nova:

On-demand token pricing (input and output) via Bedrock pricing matrix
Service tiers: Standard, Priority, and Flex
Batch inference support for selected models
Region and access path affect final effective rates

Because Bedrock pricing is published in a dynamic matrix and can vary by region/tier, treat Nova pricing as a configuration exercise (model + region + tier), not a single global number.

If you are choosing between managed platforms rather than direct vendor APIs, Bedrock vs Vertex AI pricing: what teams actually pay is the better next read.

xAI Grok API Pricing (July 2026)

xAI's lineup is led by Grok 4.3 (the current flagship, April 2026) alongside fast/mini options.

grok-4.3 — $1.25 per 1M input tokens, $2.50 per 1M output tokens. Cached input: $0.20/1M. 1M-token context.

grok-4.1-fast — $0.20 per 1M input tokens, $0.50 per 1M output tokens. One of the cheapest frontier-tier options.

Extra costs to watch: xAI tool calls (web search, code execution, X search) are billed separately from token charges, and access/availability can vary by model tier and account context.

Mistral API Pricing (July 2026)

Mistral's model lineup runs on its newer generations (Large 3, Small 4, plus Medium and Nemo tiers). Their public pricing page is dynamic, and rates can change per deployment path (La Plateforme, cloud partners, or self-hosted).

Commonly reported API rates (check current checkout before production commitments):

Mistral Large 3 — around $2.00 input / $6.00 output per 1M tokens (per the official pricing page)
Mistral Small 4 — low-cost tier, roughly $0.15 input / $0.60 output per 1M tokens
Mistral Nemo — often listed in the low-cost tier (single-digit cents per 1M tokens)

If you're cost-optimizing aggressively, validate the exact SKU and deployment region at purchase time, not from cached comparison pages.

Cursor Pricing (July 2026)

Cursor moved from the older "fast requests" messaging to plan tiers with usage multipliers and credits.

Hobby — Free. Limited Agent requests and limited Tab completions.

Pro — $20/month. Extended Agent limits, unlimited Tab, cloud agents, max context windows.

Pro+ — $60/month. Higher included usage (3x on OpenAI/Claude/Gemini model usage envelope).

Ultra — $200/month. Highest usage envelope (20x) and priority feature access.

Teams — $40/user/month. Centralized billing, shared assets, usage analytics/reporting, RBAC, and SAML/OIDC SSO.

The practical implication: for individual heavy users, Pro+ or Ultra is now the realistic comparison, not only Pro. For engineering teams that need seat and usage visibility in the same place, see Cursor cost monitoring.

Hugging Face API and Endpoint Pricing (July 2026)

Hugging Face emphasizes Inference Providers routing with transparent pass-through billing.

Monthly Inference Provider credits:

Free users: a small monthly credit allotment
PRO users: $9/month plan, which includes 2M monthly Inference Provider credits
Team/Enterprise: per-seat plans with included credits

After credits, usage is pay-as-you-go, and HF states no markup over provider pricing for routed calls.

Inference Endpoints (dedicated): hourly compute billing by instance type. Typical range starts around $0.033/hour for small CPU and extends upward for GPU tiers (for example, A10G at $1.00/hour for 1x on listed AWS configs). If you use Hugging Face alongside direct model vendors, Hugging Face cost monitoring gives you the same daily view across providers.

If you are deciding whether to route through Hugging Face or call providers directly, read Hugging Face vs Direct Provider APIs: Cost Trade-offs in 2026.

Master Comparison Table (July 2026 Snapshot)

Model	Input ($/1M)	Output ($/1M)	Context Window	Notes
GPT-5.5	$5.00	$30.00	1M	Cached input $0.50; current GA flagship
GPT-5 Mini	$0.25	$2.00	provider-dependent	Low-cost OpenAI tier
Claude Opus 4.8	$5.00	$25.00	up to 1M (tiered)	Current premium Claude tier
Claude Sonnet 4.6	$3.00	$15.00	up to 1M (tiered)	Long-context premium above 200K
Claude Haiku 4.5	$1.00	$5.00	200K	Fast, lower-cost Claude tier
Gemini 3.5 Flash	$1.50	$9.00	up to 1M	Cached input $0.15; current GA Flash
Gemini 3.1 Pro	$2.00	$12.00	up to 1M (tiered)	Higher rates above 200K input
Gemini 2.5 Flash Lite	$0.10	$0.40	large context	Cheapest Gemini text tier
Amazon Nova (Bedrock)	varies by model/tier	varies by model/tier	Micro 128K, Lite/Pro up to 300K, Premier up to 1M	Region and service tier impact price
grok-4.3	$1.25	$2.50	1M	Cached input $0.20; separate tool-call charges apply
grok-4.1-fast	$0.20	$0.50	large context	Cheap fast tier
Mistral Large 3*	~$2.00	~$6.00	~128K	Dynamic pricing page
Mistral Small 4*	~$0.15	~$0.60	variant-dependent	Confirm exact SKU rate

Mistral rates are included as market-quoted ranges where static official docs tables are not currently published in a crawlable format.

What This Costs in Practice (Updated)

A typical product team using AI across a few different tasks:

GPT-5 Mini for app chat — 500,000 input tokens/day, 150,000 output tokens/day -> ~$0.13/day input + ~$0.30/day output -> ~$13/month
Gemini 3.5 Flash for high-volume classification — 5M input/day, 500K output/day -> ~$7.50/day input + ~$4.50/day output -> ~$360/month
Cursor Teams for 10 developers -> $400/month
Hugging Face Endpoint (small CPU) always-on baseline -> ~$24/month at ~$0.033/hour

Combined estimate: ~$800/month before overages, tool calls, or long-context premiums. The important operational point is not the exact number, but how quickly the combined bill becomes hard to track once several providers are live. That is where AI cost monitoring or a broader cloud + AI cost monitoring layer becomes useful.

If you are budgeting for a small company rather than a single product workflow, How Much AI API Spend Should a Startup Expect Per Month? gives a better planning lens.

Related decisions

Readers who land here usually branch into one of these next:

FAQ (July 2026 pricing queries)

What is the current xAI Grok API pricing in July 2026?

In this July 2026 snapshot, grok-4.3 (the current flagship) is listed at $1.25 per 1M input tokens and $2.50 per 1M output tokens with a 1M-token context, while grok-4.1-fast is listed at $0.20 input and $0.50 output per 1M tokens. xAI tool calls are billed separately from token usage.

What is the current OpenAI API pricing in July 2026?

In this guide's July 2026 snapshot, GPT-5.5 (the current GA flagship) is listed at $5.00 input and $30.00 output per 1M tokens (cached input $0.50), and GPT-5 Mini is listed at $0.25 input and $2.00 output. GPT-5.6 (Sol/Terra/Luna) is in limited preview. Batch API and cached input pricing can materially reduce effective cost.

What is the current Anthropic API pricing in July 2026?

This snapshot lists Claude Opus 4.8 at $5.00 input / $25.00 output, Claude Sonnet 4.6 at $3.00 / $15.00, and Claude Haiku 4.5 at $1.00 / $5.00 per 1M tokens, with long-context premiums above 200K input tokens on supported tiers.

Where is the official OpenAI API pricing page?

Use the official OpenAI pricing pages directly:

Where is the official xAI Grok API pricing page?

Use the official xAI models/pricing documentation:

xAI Models and Pricing

Tracking AI API Spend Across Providers

The challenge with multi-provider AI usage isn't any single bill — it's the aggregate. When OpenAI, Anthropic, Cursor, and Hugging Face all invoice separately, on different cycles, with different units, the combined picture doesn't exist unless you build it.

Connect all your AI providers to StackSpend for a single view of total AI API spend, daily anomaly detection (with webhooks to push alerts to your systems), and pace-to-forecast alerts. Setup guides: OpenAI, Anthropic, Cursor, Hugging Face, GCP (Gemini via Vertex).

Current AI API Pricing July 2026: OpenAI, Grok, Anthropic, Gemini