Stop hardcoding gpt-5.4 for every request.
Use model="auto-cheap" / "auto-code" / "auto-best" and let AIPower pick. Save 60-95% without quality loss.
How it works in 1 line of code:
response = client.chat.completions.create(
model="auto-code", # ← router picks Claude Sonnet 4
messages=[{"role":"user", "content":"Refactor this function..."}],
)No change to request / response shape. The modelfield in the response shows which model actually ran, so you can track cost & quality per route.
Each mode targets a different optimization goal.
General chat / most tasks
model="auto"
Routes to: DeepSeek V3
Goal: Balance of cost & quality
Cost: ~$0.35/M
Default choice when you don't know which to pick
Batch processing, classification
model="auto-cheap"
Routes to: Doubao Pro 256K
Goal: Lowest possible price
Cost: ~$0.08/M
Classify 1M user messages — cost ~$2 vs $75 on Claude
Realtime chat, suggestions, autocomplete
model="auto-fast"
Routes to: Qwen Turbo
Goal: Fastest time-to-first-token
Cost: ~$0.15/M
Live chatbot where latency > quality
Writing / refactoring code
model="auto-code"
Routes to: Claude Sonnet 4
Goal: Best coding accuracy (78% SWE-bench)
Cost: ~$10/M
AI-coded features, bug fixing, agentic workflows
High-stakes reasoning
model="auto-best"
Routes to: Claude Opus 4.6
Goal: Highest quality regardless of price
Cost: ~$18/M
Legal analysis, research synthesis, complex decisions
Demos, experiments, dev scripts
model="auto-free"
Routes to: GLM-4 Flash
Goal: Near-zero cost
Cost: ~$0.01/M
Prompt engineering experiments, demos
Assume 500 input tokens + 200 output per request (typical chatbot). Here's what you'd pay.
| Strategy | Cost / day | Cost / month | Savings |
|---|---|---|---|
| Only Claude Opus 4.6 | $5,750 | $172,500 | baseline |
| Only GPT-5.4 | $3,300 | $99,000 | -43% |
| Only DeepSeek V3 | $310 | $9,300 | -95% |
| AIPower smart routing 80% auto-cheap · 15% auto-code · 5% auto-best | $530 | $15,900 | -91% |
Savings come from sending simple tasks to cheap models and reserving expensive models for where they matter. Same user experience, 10-20× less spend.
If your primary model returns 5xx, we fall back to a different provider — same request, different upstream. Transparent to your app.
Real providers go down. Last 90 days in 2026:
Without failover: your app was dead during at least one. With AIPower: degraded latency but stayed up.
# Failover chains by provider family
FALLBACK = {
openai: [claude-sonnet, deepseek-chat],
anthropic: [gpt-5.4, deepseek-chat],
deepseek: [qwen-plus, gpt-4o-mini],
qwen: [deepseek-chat, gpt-4o-mini],
google: [claude-sonnet, gpt-5.4],
...
}
# Your code stays 1 line:
client.chat.completions.create(
model="openai/gpt-5.4",
messages=[...])
# If OpenAI 5xx, routes to Claude.
# Transparent, logged.You could. But you'd have to:
Or use AIPower:
All for 15% markup (close to what you'd spend on DevOps to maintain the same thing).
from openai import OpenAI
client = OpenAI(
base_url="https://api.aipower.me/v1",
api_key="sk-your-aipower-key",
)
def smart_ask(task_type: str, messages: list):
"""Route based on task type."""
route_map = {
"chat": "auto", # DeepSeek V3 — balanced
"batch": "auto-cheap", # Doubao Pro — cheapest
"realtime": "auto-fast", # Qwen Turbo — fastest
"code": "auto-code", # Claude Sonnet 4 — best at code
"hard": "auto-best", # Claude Opus — highest quality
"demo": "auto-free", # GLM-4 Flash — near-zero cost
}
return client.chat.completions.create(
model=route_map.get(task_type, "auto"),
messages=messages,
)
# Usage
cheap_classification = smart_ask("batch", [...]) # ~$0.001 per call
code_refactor = smart_ask("code", [...]) # ~$0.03 per call
important_decision = smart_ask("hard", [...]) # ~$0.05 per call
live_chat_response = smart_ask("realtime", [...]) # ~$0.001 per call, < 500ms
One API key. 16 models. 6 routing modes. Save 60-95% vs hardcoding a premium model.
Also: docs / analytics / in-depth blog