Smart Routing

One call, the right model.

Stop hardcoding gpt-5.4 for every request.

Use model="auto-cheap" / "auto-code" / "auto-best" and let AIPower pick. Save 60-95% without quality loss.

Try routing free — 2 calls

How it works in 1 line of code:

response = client.chat.completions.create(
    model="auto-code",              # ← router picks Claude Sonnet 4
    messages=[{"role":"user", "content":"Refactor this function..."}],
)

No change to request / response shape. The modelfield in the response shows which model actually ran, so you can track cost & quality per route.

Six routing modes

Each mode targets a different optimization goal.

General chat / most tasks

model="auto"

🎯

Routes to: DeepSeek V3

Goal: Balance of cost & quality

Cost: ~$0.35/M

Default choice when you don't know which to pick

Batch processing, classification

model="auto-cheap"

💰

Routes to: Doubao Pro 256K

Goal: Lowest possible price

Cost: ~$0.08/M

Classify 1M user messages — cost ~$2 vs $75 on Claude

Realtime chat, suggestions, autocomplete

model="auto-fast"

Routes to: Qwen Turbo

Goal: Fastest time-to-first-token

Cost: ~$0.15/M

Live chatbot where latency > quality

Writing / refactoring code

model="auto-code"

💻

Routes to: Claude Sonnet 4

Goal: Best coding accuracy (78% SWE-bench)

Cost: ~$10/M

AI-coded features, bug fixing, agentic workflows

High-stakes reasoning

model="auto-best"

🧠

Routes to: Claude Opus 4.6

Goal: Highest quality regardless of price

Cost: ~$18/M

Legal analysis, research synthesis, complex decisions

Demos, experiments, dev scripts

model="auto-free"

🆓

Routes to: GLM-4 Flash

Goal: Near-zero cost

Cost: ~$0.01/M

Prompt engineering experiments, demos

Real cost math — 1M requests/day app

Assume 500 input tokens + 200 output per request (typical chatbot). Here's what you'd pay.

StrategyCost / dayCost / monthSavings
Only Claude Opus 4.6$5,750$172,500baseline
Only GPT-5.4$3,300$99,000-43%
Only DeepSeek V3$310$9,300-95%
AIPower smart routing
80% auto-cheap · 15% auto-code · 5% auto-best
$530$15,900-91%

Savings come from sending simple tasks to cheap models and reserving expensive models for where they matter. Same user experience, 10-20× less spend.

Automatic failover

If your primary model returns 5xx, we fall back to a different provider — same request, different upstream. Transparent to your app.

Real providers go down. Last 90 days in 2026:

  • • OpenAI — 3 major outages (2h+ each)
  • • Anthropic — 2 major outages
  • • Google AI — 1 capacity event
  • • DeepSeek — 1 peak-hour throttle

Without failover: your app was dead during at least one. With AIPower: degraded latency but stayed up.

# Failover chains by provider family
FALLBACK = {
  openai:    [claude-sonnet, deepseek-chat],
  anthropic: [gpt-5.4, deepseek-chat],
  deepseek:  [qwen-plus, gpt-4o-mini],
  qwen:      [deepseek-chat, gpt-4o-mini],
  google:    [claude-sonnet, gpt-5.4],
  ...
}

# Your code stays 1 line:
client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[...])
# If OpenAI 5xx, routes to Claude.
# Transparent, logged.

"Why not just build this myself?"

You could. But you'd have to:

  • • Maintain accounts with 10 providers (OpenAI, Anthropic, Google, DeepSeek, Qwen, Zhipu, Moonshot, MiniMax, ByteDance, Alibaba)
  • • Handle 10 different API formats (OpenAI-compat is not universal)
  • • Keep track of when each provider's pricing / rate limits change
  • • Monitor uptime per provider and update fallback logic
  • • Track which model is best per task-type (benchmarks shift quarterly)
  • • Build dashboards for per-model cost/latency observability

Or use AIPower:

  • • One account, one API key
  • • OpenAI SDK compatible (change 1 line)
  • • Pricing updated weekly in the platform
  • • Failover built-in
  • • Smart routing by scenario
  • /dashboard/analytics shows you everything

All for 15% markup (close to what you'd spend on DevOps to maintain the same thing).

Full code example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aipower.me/v1",
    api_key="sk-your-aipower-key",
)

def smart_ask(task_type: str, messages: list):
    """Route based on task type."""
    route_map = {
        "chat":      "auto",            # DeepSeek V3 — balanced
        "batch":     "auto-cheap",      # Doubao Pro — cheapest
        "realtime":  "auto-fast",       # Qwen Turbo — fastest
        "code":      "auto-code",       # Claude Sonnet 4 — best at code
        "hard":      "auto-best",       # Claude Opus — highest quality
        "demo":      "auto-free",       # GLM-4 Flash — near-zero cost
    }
    return client.chat.completions.create(
        model=route_map.get(task_type, "auto"),
        messages=messages,
    )

# Usage
cheap_classification = smart_ask("batch", [...])   # ~$0.001 per call
code_refactor        = smart_ask("code",  [...])   # ~$0.03 per call
important_decision   = smart_ask("hard",  [...])   # ~$0.05 per call
live_chat_response   = smart_ask("realtime", [...]) # ~$0.001 per call, < 500ms

Stop overpaying. Start routing.

One API key. 16 models. 6 routing modes. Save 60-95% vs hardcoding a premium model.

Also: docs / analytics / in-depth blog