AI API billing is based on tokens, not requests. Understanding how tokens work, how to count them, and how to minimize them is the difference between a $100/month bill and a $10,000/month bill for the same functionality.

How AI API Pricing Works

Every AI API charges separately for input tokens (your prompt) and output tokens (the model's response). The formula is simple:

Cost = (input_tokens / 1,000,000) * input_price + (output_tokens / 1,000,000) * output_price

# Example: GPT-5 via AIPower
# 2,000 input tokens + 500 output tokens
cost = (2000 / 1_000_000) * 3.75 + (500 / 1_000_000) * 22.50
# cost = $0.0075 + $0.01125 = $0.019 per request

Token Counting: Rules of Thumb

1 token is roughly 4 characters or 0.75 words in English
1,000 tokens is roughly 750 words
A typical chat message: 50-200 tokens
A system prompt: 200-2,000 tokens
A full page of text: ~500 tokens
CJK languages use more tokens per character (Chinese: ~1.5 tokens/character)

Count Tokens Programmatically

import tiktoken

def count_tokens(text, model="gpt-4o"):
    """Count tokens for a given text. Works for most models."""
    enc = tiktoken.encoding_for_model(model)
    return len(enc.encode(text))

# Count before sending
prompt = "Explain the theory of relativity in simple terms."
token_count = count_tokens(prompt)
print(f"Tokens: {token_count}")  # ~11 tokens

# Estimate cost before calling API
input_price = 0.34  # DeepSeek V3 per M tokens
estimated_cost = (token_count / 1_000_000) * input_price
print(f"Estimated input cost: $\{estimated_cost:.6f}")

Cost Comparison Table (per 1M tokens)

Model	Input	Output	1K Requests Cost*
GLM-4 Flash	$0.01	$0.01	$0.03
Doubao Pro	$0.06	$0.11	$0.17
Qwen Turbo	$0.08	$0.30	$0.46
DeepSeek V3	$0.32	$0.48	$0.88
Gemini 2.5 Flash	$0.35	$2.88	$2.14
GPT-5	$2.88	$17.25	$14.08
Claude Sonnet 4	$3.45	$17.25	$15.53
Claude Opus 4.6	$5.75	$28.75	$25.88

*Estimated for 2K input + 500 output tokens per request.

Optimization Strategy 1: Model Tiering

Route requests to the cheapest model that can handle the task:

from openai import OpenAI
client = OpenAI(base_url="https://api.aipower.me/v1", api_key="YOUR_KEY")

def cost_optimized_call(prompt, task_type="general"):
    model_map = {
        "classification": "zhipu/glm-4-flash",       # $0.01/M
        "extraction": "doubao/doubao-pro-256k",       # $0.06/M
        "general": "deepseek/deepseek-chat",          # $0.34/M
        "coding": "anthropic/claude-sonnet",           # $4.50/M
        "reasoning": "deepseek/deepseek-reasoner",     # $0.34/M
    }
    model = model_map.get(task_type, "auto")
    return client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    ).choices[0].message.content

Optimization Strategy 2: Response Caching

import hashlib, json

cache = {}

def cached_call(messages, model="deepseek/deepseek-chat"):
    key = hashlib.md5(json.dumps(messages).encode()).hexdigest()
    if key in cache:
        return cache[key]  # Free — no API call
    response = client.chat.completions.create(model=model, messages=messages)
    result = response.choices[0].message.content
    cache[key] = result
    return result
# 30-60% cache hit rate is typical for production apps

Optimization Strategy 3: Prompt Compression

Trim system prompts: Remove verbose instructions. "Be concise" works as well as a 500-word style guide.
Limit history: Send last 5-10 messages, not the entire conversation.
Summarize context: Compress long documents before including them as context.
Use max_tokens: Cap output length to avoid runaway responses.

Monthly Cost Estimator

Daily Requests	GLM-4 Flash	DeepSeek V3	GPT-5
100	$0.09/mo	$2.79/mo	$56.25/mo
1,000	$0.90/mo	$27.90/mo	$562.50/mo
10,000	$9.00/mo	$279.00/mo	$5,625/mo
100,000	$90.00/mo	$2,790/mo	$56,250/mo

Monitor your spending in real time on the AIPower dashboard. Start with 10 trial calls at aipower.me to benchmark costs for your use case.

AI API Cost Calculator: How to Optimize Your AI Spending