AI Token Counter
Estimate token counts and API costs for GPT-4o, Claude, Gemini and more. Paste text, see results instantly. 100% client-side.
| Model | Input | Output |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o Mini | $0.150 | $0.60 |
| Claude Sonnet 4 | $3.00 | $15.00 |
| Claude Opus 4 | $15.00 | $75.00 |
| Gemini 1.5 Pro | $1.25 | $5.00 |
| Gemini 1.5 Flash | $0.075 | $0.30 |
How it works: Token counts are estimated using a hybrid of word-based (~1.3 tokens/word) and character-based (~4 chars/token) heuristics, averaged for accuracy. Actual token counts vary by model tokenizer. For exact counts, use each provider's official tokenizer API. All processing runs in your browser — no data is sent anywhere.
Related Tools
From the makers of JSON Knife
Get the JSON & API Cheat Sheet
Formatting tricks, jq commands, and common patterns — one page, zero fluff.
AI Token Counter — Understanding LLM Tokens and API Costs
AI tokens are the fundamental units that large language models use to process text. Rather than splitting text by characters or words, LLMs use a technique called Byte Pair Encoding (BPE) to split text into sub-word pieces. Common words like "the" or "is" are typically one token, while longer or less common words get split into multiple tokens. Numbers, punctuation, and code often use more tokens per character than plain English prose. This AI token counter estimates token counts for GPT-4o, Claude, Gemini, and other major models, and shows the estimated API cost in real time.
Why token counting matters: LLM APIs are billed per token, and every model has a maximum context window measured in tokens. If your prompt plus expected output exceeds the context limit, the request will fail or the model will truncate its response. Understanding token counts is essential for budgeting API costs at scale, designing prompts that fit within model limits, and choosing the right model for each task based on cost efficiency.
Token counts vary by model because each provider uses a different tokenizer. GPT-4o and other OpenAI models use the cl100k_base tiktoken vocabulary. Claude uses Anthropic's own BPE tokenizer, which produces slightly different token boundaries. Gemini uses Google's SentencePiece tokenizer. This tool uses calibrated heuristics — a blend of word-based and character-based estimates — tuned for each model family. The estimates are typically within 10% of the actual tokenizer output, which is accurate enough for budgeting purposes.
Input tokens vs output tokens: LLM pricing is asymmetric — input tokens (your prompt) are always cheaper than output tokens (the model's response). For GPT-4o, input tokens cost $2.50/1M and output tokens cost $10.00/1M — a 4x difference. Claude Opus 4 has an even wider gap: $15/1M input vs $75/1M output. This matters when optimizing costs: making prompts longer to get shorter, more precise responses can actually reduce total cost if the output savings outweigh the input increase.
Pricing comparison across providers: At the time of writing, Gemini 1.5 Flash is the cheapest option at $0.075/1M input tokens — over 30x cheaper than GPT-4o. GPT-4o Mini is a cost-effective OpenAI option at $0.15/1M input. Claude Sonnet 4 strikes a balance between capability and cost at $3/1M input. Use this tool to paste your exact prompt and compare costs across models before committing to one — the cost difference for a specific use case can be dramatic.
Tips for reducing LLM API costs: Keep system prompts concise and reuse them via caching where the API supports it. Batch similar requests to minimize overhead. Use smaller, cheaper models for tasks that don't require top-tier reasoning (classification, extraction, summarization). Trim unnecessary context from multi-turn conversations to stay within token budgets. Measure your actual token usage via the API response's usage field and compare it to these estimates to calibrate your budgeting model.
Tips
- Token counts are estimates with ~10% variance. For exact counts, use the model provider's official tokenizer — but this tool is faster for quick budgeting.
- Input tokens are always cheaper than output tokens. If your use case generates long responses, focus on optimizing output length to control costs.
- Context window limits are in total tokens (input + output combined). Leave headroom for the model's response when designing prompts.
- Gemini 1.5 Flash is often 10-40x cheaper than GPT-4o or Claude Opus for simple tasks — use the cost comparison to pick the right model for each job.