FreeAI.DevTools

LLM Cost Calculator

Compare API costs across 40+ models from 7 providers. April 2026 verified pricing.

02 / 12
AI Tools
Last verified
April 2026
// ADVERTISEMENTAd space
Pricing:
Filter:Sort:
#ModelProviderEffective $/1M inInput $/1MOutput $/1M$/Request$/Day$/Month
1Gemini 2.0 Flash-LiteGoogle$0.075$0.07$0.30$0.000225$0.02$0.67
2GPT-5 NanoOpenAI$0.050$0.05$0.40$0.000250$0.03$0.75
3Llama 3.3 70BMeta$0.100$0.10$0.32$0.000260$0.03$0.78
4DeepSeek V4 FlashDeepSeek$0.140$0.14$0.28$0.000280$0.03$0.84
5GPT-4.1 NanoOpenAI$0.100$0.10$0.40$0.000300$0.03$0.90
6Gemini 2.0 FlashGoogle$0.100$0.10$0.40$0.000300$0.03$0.90
7DeepSeek V3.2DeepSeek$0.252$0.25$0.38$0.000441$0.04$1.32
8GPT-4o MiniOpenAI$0.150$0.15$0.60$0.000450$0.04$1.35
9Grok 4.1 FastxAI$0.200$0.20$0.50$0.000450$0.04$1.35
10Mistral Small 4Mistral$0.150$0.15$0.60$0.000450$0.04$1.35
11Llama 4 MaverickMeta$0.150$0.15$0.60$0.000450$0.04$1.35
12Mistral Small 3.1Mistral$0.350$0.35$0.56$0.000630$0.06$1.89
13DeepSeek V4 ProDeepSeek$0.435$0.43$0.87$0.000870$0.09$2.61
14GPT-4.1 MiniOpenAI$0.400$0.40$1.60$0.001200$0.12$3.60
15GPT-5 MiniOpenAI$0.250$0.25$2.00$0.001250$0.13$3.75
16Mistral Large 3Mistral$0.500$0.50$1.50$0.001250$0.13$3.75
17Mistral Medium 3Mistral$0.400$0.40$2.00$0.001400$0.14$4.20
18Gemini 2.5 FlashGoogle$0.300$0.30$2.50$0.001550$0.15$4.65
19DeepSeek R1DeepSeek$0.700$0.70$2.50$0.001950$0.19$5.85
20Gemini 3 FlashGoogle$0.500$0.50$3.00$0.002000$0.20$6.00
21Claude 3.5 HaikuAnthropic$0.800$0.80$4.00$0.002800$0.28$8.40
22o4-miniOpenAI$1.100$1.10$4.40$0.003300$0.33$9.90
23Claude Haiku 4.5Anthropic$1.000$1.00$5.00$0.003500$0.35$10.50
24Grok 4.20xAI$2.000$2.00$6.00$0.005000$0.50$15.00
25o3OpenAI$2.000$2.00$8.00$0.006000$0.60$18.00
26GPT-4.1OpenAI$2.000$2.00$8.00$0.006000$0.60$18.00
27GPT-5.1OpenAI$1.250$1.25$10.00$0.006250$0.63$18.75
28GPT-5OpenAI$1.250$1.25$10.00$0.006250$0.63$18.75
29Gemini 2.5 ProGoogle$1.250$1.25$10.00$0.006250$0.63$18.75
30GPT-4oOpenAI$2.500$2.50$10.00$0.007500$0.75$22.50
31Gemini 3.1 ProGoogle$2.000$2.00$12.00$0.008000$0.80$24.00
32GPT-5.2OpenAI$1.750$1.75$14.00$0.008750$0.88$26.25
33Claude Sonnet 4.6Anthropic$3.000$3.00$15.00$0.010500$1.05$31.50
34Claude Sonnet 4.5Anthropic$3.000$3.00$15.00$0.010500$1.05$31.50
35Claude Sonnet 4Anthropic$3.000$3.00$15.00$0.010500$1.05$31.50
36Grok 4xAI$3.000$3.00$15.00$0.010500$1.05$31.50
37Claude Opus 4.7Anthropic$5.000$5.00$25.00$0.017500$1.75$52.50
38Claude Opus 4.6Anthropic$5.000$5.00$25.00$0.017500$1.75$52.50
39Claude Opus 4.5Anthropic$5.000$5.00$25.00$0.017500$1.75$52.50
40GPT-5.5OpenAI$5.000$5.00$30.00$0.020000$2.00$60.00
41o3-proOpenAI$20.000$20.00$80.00$0.060000$6.00$180.00
42GPT-5.2 ProOpenAI$21.000$21.00$168.00$0.105000$10.50$315.00
43GPT-5.5 ProOpenAI$30.000$30.00$180.00$0.120000$12.00$360.00
Prices per 1M tokens (USD). Data verified April 2026. Toggles above apply caching (uses the model's cached input price when available) and/or batch API (50% off both input and output).
// ADVERTISEMENTAd space

What does an LLM cost?

Every major LLM API meters in tokens and bills per million of them. The rate card splits in two: a price for input tokens (the prompt we send) and a price for output tokens (the completion we receive). Across every provider we track, output is 3 to 8 times more expensive than input. That ratio is the single most important number in any cost model, because most teams think in terms of average prompt length and forget that the response is where the bill actually lives.

Here is the math we use to onboard new teammates. A single GPT-5.5 call with 1,000 input tokens and 500 output tokens lands at $0.005 + $0.015 = $0.020. Input runs at $5 per 1M tokens, so 1,000 / 1,000,000 * $5 = $0.005. Output runs at $30 per 1M tokens, so 500 / 1,000,000 * $30 = $0.015. The output half of the call is 75% of the cost despite shipping half as many tokens. We have seen production cost models break that 75/25 split when reasoning-heavy workloads (o4-mini, DeepSeek R1) push output past 10x the input length and the share climbs above 90%.

Per-call dollar amounts look small in isolation. They stop looking small once we multiply by traffic. The same $0.020 GPT-5.5 call run 50,000 times a day costs $1,000 daily and $30,000 monthly. Drop to GPT-5 Mini at $0.25/$2.00 and the same call costs $0.001 instead of $0.020, a 20x cut. Drop to GPT-5 Nano at $0.05/$0.40 and it is $0.00025, an 80x cut. Picking the right tier for the work is the largest single cost lever we have, larger than caching, larger than batch, larger than any prompt-engineering optimization.

How API pricing works

Three levers move the bill: input price, output price, and cached input price. Cached input is the surprise lever most teams discover only after their first invoice. On OpenAI, Anthropic, and Google the cached input rate is roughly 10% of the standard input rate, applied to any prefix of the prompt that was processed in a recent prior call. For a stable system prompt reused across every conversation, that one toggle cuts input spend by 90%.

Take a real workload we cost-modeled last quarter: a customer-facing chatbot doing 10,000 daily user messages, 800 input tokens per call (system prompt + recent conversation), 400 output tokens per response, 30 days a month. That is 240M input tokens and 120M output tokens per month. On GPT-5.5 at $5/$30 per 1M, the uncached bill is 240 * $5 + 120 * $30 = $1,200 + $3,600 = $4,800 per month. Output is 75% of spend even though we ship twice as many input tokens.

Now turn caching on. Roughly 600 of the 800 input tokens are a fixed system prompt that lives in every call. Those 180M tokens hit the cache at $0.50 per 1M instead of $5, costing $90 instead of $900. The remaining 60M uncached input tokens still bill at $5 for $300. Total input drops from $1,200 to about $390. Add the unchanged $3,600 of output and the bill lands near $3,990 per month, a 17% cut from one config flag. The savings scale with system-prompt length: at 4,000 cached tokens the cut moves into the 30-40% range.

The same workload on DeepSeek V4 Flash at $0.14/$0.28 per 1M costs 240 * $0.14 + 120 * $0.28 = $33.60 + $33.60 = $67.20 per month. That is roughly 70 times cheaper than uncached GPT-5.5 for the identical token volume. We do not recommend swapping flagship for budget on every workload, but for high-volume classification, summarization, or first-draft generation we route to DeepSeek V4 Flash or Gemini 2.0 Flash-Lite ($0.075/$0.30) by default and reserve flagship spend for the calls where reasoning quality actually shows up in the output.

// ADVERTISEMENTAd space

Common pitfalls

Most cost-model errors do not come from the math. They come from missed levers: a default config that ignores caching, a forecast built on input tokens alone, or a per-call estimate that never gets reconciled against the provider invoice. Roughly 80% of the cost reviews we run uncover at least one of the four errors below, and the median impact is a 20-40% gap between forecast and actual spend.

  • Ignoring the output multiplier. On almost every realistic workload, 70-85% of total spend is output. We have seen teams plan capacity around input length (because input is what they author and edit) and miss that a verbose response template was driving 4x the cost. The first lever we pull on any expensive workload is response length: caps via max_tokens, format swaps from prose to JSON, and explicit length instructions in the system prompt.
  • Forgetting batch API discounts. OpenAI, Anthropic, and Google all expose an asynchronous batch endpoint at 50% off the standard rate, with results returned within 24 hours. On GPT-5.5 ($5/$30), batch drops effective pricing to $2.50/$15. Any non-realtime workload (overnight evaluations, embedding generation, bulk summarization, content backfills) belongs on batch. We default to batch for anything that does not need a sub-second user-facing response.
  • Not accounting for prompt-cache savings. Cached input is roughly 10% of the standard input price across the major providers: GPT-5.5 at $0.50 cached vs $5 standard, Claude Opus 4.7 at $0.50 vs $5, GPT-5 Mini at $0.025 vs $0.25. For any product with a stable system prompt over about 1,000 tokens, caching pays for itself on the first day. We turn it on for every chatbot, every agent, and every API product before we ship.
  • Estimating token count instead of measuring. A character-divided-by-four heuristic is fine for a back-of-envelope but wrong by 10-25% on code, markdown, or non-English text. We always run representative inputs through the Token Counter (exact tiktoken for OpenAI, per-provider ratios for the rest) before signing off on a forecast.

When to use this tool

We built the calculator for three concrete decisions we kept hitting ourselves. The first is a founder pricing a new feature: does $10 per user per month math out if each user fires roughly 200 LLM calls a month at 1,200 input plus 500 output tokens on Claude Sonnet 4.6 ($3/$15)? Sortable rows, batch and cache toggles, and per-million math on every provider give us the answer in under a minute, before product decides what to charge. The second is a procurement team comparing AWS Bedrock pass-through pricing against direct billing from Anthropic or OpenAI: same model, same token volume, two pricing structures, and a side-by-side the finance team can read. The third is a finance team modeling usage-based pricing for a B2B AI product, where the question is what gross margin survives at the 90th-percentile user. Concrete cost-per-call numbers across 40+ models turn that conversation from a guess into a spreadsheet, which is the same workflow we use ourselves whenever a new model row lands in the data. The tier filter (flagship, budget, reasoning, premium) plus the verified-monthly LAST_VERIFIED stamp keep the comparison honest: the $30/$180 GPT-5.5 Pro premium row and the $0.075/$0.30 Gemini 2.0 Flash-Lite budget row both reflect today's published rate cards, not last year's screenshots.

// ADVERTISEMENTAd space

Frequently asked

Which AI model gives the best value for money?
For general tasks, DeepSeek V3.2 and GPT-5 Mini offer the best quality-per-dollar. For coding, Claude Sonnet 4.5 and Gemini 2.5 Pro lead. For budget applications, Gemini 2.0 Flash-Lite is hard to beat at $0.075 per million input tokens.
How often do AI model prices change?
Prices have been dropping roughly 50-80% year over year. OpenAI, Google, and Anthropic typically announce price changes alongside new model releases, which happen every few months.
Are there free AI APIs?
Google Gemini offers the most generous free tier with up to 1,000 requests per day on most models. OpenAI gives $5 in free credits to new users. DeepSeek offers off-peak discounts of 50-75%.
How do I include batch API discounts?
Our calculator does not auto-apply batch pricing in this version. For OpenAI, Anthropic, Google, and most providers, multiply your input and output costs by 0.5 when running asynchronous batch jobs that return within 24 hours. On GPT-5.5 ($5/$30), batch drops effective pricing to $2.50/$15 per 1M tokens. We recommend batch for any non-realtime workload (evaluations, embedding generation, bulk summarization) to halve the bill with no code changes beyond the endpoint.
Should I use prompt caching?
Yes for any system prompt over roughly 1,000 tokens reused across consecutive requests. OpenAI cached input runs at 10% of standard input across the GPT-5 family: GPT-5.5 at $0.50 cached vs $5 standard, GPT-5 Mini at $0.025 vs $0.25. Claude Opus 4.7 caches at $0.50 vs $5 standard, the same 10x ratio. We turn caching on for every chatbot, every agent, and every API product with a stable system message.
How accurate is this calculator?
Pricing is verified monthly against each provider's published rate card, with the LAST_VERIFIED stamp shown on every tool page (currently April 2026). Token counts are exact via tiktoken for OpenAI and estimated within +/-15% for other providers using TOKENIZER_RATIOS. The math itself is exact: given correct token counts and current prices, the dollar figures are accurate to the cent. We re-sync via npm run sync:models monthly.

More Tools

6 OF 11