Guide
AI API Cost Calculator: How to Optimize Your AI Spending
April 17, 2026 · 9 min read
AI API billing is based on tokens, not requests. Understanding how tokens work, how to count them, and how to minimize them is the difference between a $100/month bill and a $10,000/month bill for the same functionality.
How AI API Pricing Works
Every AI API charges separately for input tokens (your prompt) and output tokens (the model's response). The formula is simple:
Cost = (input_tokens / 1,000,000) * input_price + (output_tokens / 1,000,000) * output_price
# Example: GPT-5.4 via AIPower
# 2,000 input tokens + 500 output tokens
cost = (2000 / 1_000_000) * 3.75 + (500 / 1_000_000) * 22.50
# cost = $0.0075 + $0.01125 = $0.019 per requestToken Counting: Rules of Thumb
- 1 token is roughly 4 characters or 0.75 words in English
- 1,000 tokens is roughly 750 words
- A typical chat message: 50-200 tokens
- A system prompt: 200-2,000 tokens
- A full page of text: ~500 tokens
- CJK languages use more tokens per character (Chinese: ~1.5 tokens/character)
Count Tokens Programmatically
import tiktoken
def count_tokens(text, model="gpt-4o"):
"""Count tokens for a given text. Works for most models."""
enc = tiktoken.encoding_for_model(model)
return len(enc.encode(text))
# Count before sending
prompt = "Explain the theory of relativity in simple terms."
token_count = count_tokens(prompt)
print(f"Tokens: {token_count}") # ~11 tokens
# Estimate cost before calling API
input_price = 0.34 # DeepSeek V3 per M tokens
estimated_cost = (token_count / 1_000_000) * input_price
print(f"Estimated input cost: $\{estimated_cost:.6f}")Cost Comparison Table (per 1M tokens)
| Model | Input | Output | 1K Requests Cost* |
|---|---|---|---|
| GLM-4 Flash | $0.01 | $0.01 | $0.03 |
| Doubao Pro | $0.06 | $0.11 | $0.17 |
| Qwen Turbo | $0.08 | $0.31 | $0.47 |
| DeepSeek V3 | $0.34 | $0.50 | $0.93 |
| Gemini 2.5 Flash | $0.15 | $0.60 | $0.90 |
| GPT-5.4 | $3.75 | $22.50 | $18.75 |
| Claude Sonnet 4 | $4.50 | $22.50 | $20.25 |
| Claude Opus 4.6 | $7.50 | $37.50 | $33.75 |
*Estimated for 2K input + 500 output tokens per request.
Optimization Strategy 1: Model Tiering
Route requests to the cheapest model that can handle the task:
from openai import OpenAI
client = OpenAI(base_url="https://api.aipower.me/v1", api_key="YOUR_KEY")
def cost_optimized_call(prompt, task_type="general"):
model_map = {
"classification": "zhipu/glm-4-flash", # $0.01/M
"extraction": "doubao/doubao-pro-256k", # $0.06/M
"general": "deepseek/deepseek-chat", # $0.34/M
"coding": "anthropic/claude-sonnet", # $4.50/M
"reasoning": "deepseek/deepseek-reasoner", # $0.34/M
}
model = model_map.get(task_type, "auto")
return client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
).choices[0].message.contentOptimization Strategy 2: Response Caching
import hashlib, json
cache = {}
def cached_call(messages, model="deepseek/deepseek-chat"):
key = hashlib.md5(json.dumps(messages).encode()).hexdigest()
if key in cache:
return cache[key] # Free — no API call
response = client.chat.completions.create(model=model, messages=messages)
result = response.choices[0].message.content
cache[key] = result
return result
# 30-60% cache hit rate is typical for production appsOptimization Strategy 3: Prompt Compression
- Trim system prompts: Remove verbose instructions. "Be concise" works as well as a 500-word style guide.
- Limit history: Send last 5-10 messages, not the entire conversation.
- Summarize context: Compress long documents before including them as context.
- Use max_tokens: Cap output length to avoid runaway responses.
Monthly Cost Estimator
| Daily Requests | GLM-4 Flash | DeepSeek V3 | GPT-5.4 |
|---|---|---|---|
| 100 | $0.09/mo | $2.79/mo | $56.25/mo |
| 1,000 | $0.90/mo | $27.90/mo | $562.50/mo |
| 10,000 | $9.00/mo | $279.00/mo | $5,625/mo |
| 100,000 | $90.00/mo | $2,790/mo | $56,250/mo |
Monitor your spending in real time on the AIPower dashboard. Start with 50 free API calls at aipower.me to benchmark costs for your use case.