Smart Routing

One call, the right model.

Stop hardcoding gpt-5 for every request.

Use model="auto" / "auto-code" / "auto-best" and let AIPower pick, fail over, and log the route.

Try routing β€” 10 trial calls

How it works in 1 line of code:

response = client.chat.completions.create(
    model="auto-code",              # ← router picks Claude Sonnet 4
    messages=[{"role":"user", "content":"Refactor this function..."}],
)

No change to request / response shape. The modelfield in the response shows which model actually ran, so you can track cost & quality per route.

Six routing modes

Each mode targets a different optimization goal.

General chat / most tasks

model="auto"

🎯

Routes to: DeepSeek V3

Goal: Balance of cost & quality

Cost: ~$0.35/M

Default choice when you don't know which to pick

Batch processing, classification

model="auto-cheap"

πŸ’°

Routes to: Doubao Pro 256K

Goal: Cost control for routine work

Cost: ~$0.08/M

Classify high-volume messages without hardcoding a premium model

Realtime chat, suggestions, autocomplete

model="auto-fast"

⚑

Routes to: Qwen Turbo

Goal: Fastest time-to-first-token

Cost: ~$0.15/M

Live chatbot where latency > quality

Writing / refactoring code

model="auto-code"

πŸ’»

Routes to: Claude Sonnet 4

Goal: Best coding accuracy (78% SWE-bench)

Cost: ~$10/M

AI-coded features, bug fixing, agentic workflows

High-stakes reasoning

model="auto-best"

🧠

Routes to: Claude Opus 4.6

Goal: Highest quality regardless of price

Cost: ~$18/M

Legal analysis, research synthesis, complex decisions

Demos, experiments, dev scripts

model="auto-free"

✨

Routes to: GLM-4 Flash

Goal: Utility workload routing

Cost: ~$0.01/M

Prompt engineering experiments and demos

Real cost math β€” 1M requests/day app

Assume 500 input tokens + 200 output per request (typical chatbot). Here's what you'd pay.

StrategyCost / dayCost / monthDifference
Only Claude Opus 4.6$5,750$172,500baseline
Only GPT-5$3,300$99,000-43%
Only DeepSeek V3$310$9,300-95%
AIPower smart routing
80% auto-cheap Β· 15% auto-code Β· 5% auto-best
$530$15,900-91%

Spend control comes from sending routine tasks to efficient models and reserving premium models for where they matter.

Automatic failover

If your primary model returns 5xx, we fall back to a different provider β€” same request, different upstream. Transparent to your app.

Real providers go down. Last 90 days in 2026:

  • β€’ OpenAI β€” 3 major outages (2h+ each)
  • β€’ Anthropic β€” 2 major outages
  • β€’ Google AI β€” 1 capacity event
  • β€’ DeepSeek β€” 1 peak-hour throttle

Without failover: your app was dead during at least one. With AIPower: degraded latency but stayed up.

# Failover chains by provider family
FALLBACK = {
  openai:    [claude-sonnet, deepseek-chat],
  anthropic: [gpt-5, deepseek-chat],
  deepseek:  [qwen-plus, gpt-4o-mini],
  qwen:      [deepseek-chat, gpt-4o-mini],
  google:    [claude-sonnet, gpt-5],
  ...
}

# Your code stays 1 line:
client.chat.completions.create(
    model="openai/gpt-5",
    messages=[...])
# If OpenAI 5xx, routes to Claude.
# Transparent, logged.

"Why not just build this myself?"

You could. But you'd have to:

  • β€’ Maintain accounts with 10 providers (OpenAI, Anthropic, Google, DeepSeek, Qwen, Zhipu, Moonshot, MiniMax, ByteDance, Alibaba)
  • β€’ Handle 10 different API formats (OpenAI-compat is not universal)
  • β€’ Keep track of when each provider's pricing / rate limits change
  • β€’ Monitor uptime per provider and update fallback logic
  • β€’ Track which model is best per task-type (benchmarks shift quarterly)
  • β€’ Build dashboards for per-model cost/latency observability

Or use AIPower:

  • β€’ One account, one API key
  • β€’ OpenAI SDK compatible (change 1 line)
  • β€’ Pricing updated weekly in the platform
  • β€’ Failover built-in
  • β€’ Smart routing by scenario
  • β€’ /dashboard/analytics shows you everything

Routing, failover, pricing updates, and analytics are included in the managed gateway instead of becoming another internal DevOps project.

Full code example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.aipower.me/v1",
    api_key="sk-your-aipower-key",
)

def smart_ask(task_type: str, messages: list):
    """Route based on task type."""
    route_map = {
        "chat":      "auto",            # DeepSeek V3 β€” balanced
        "batch":     "auto-cheap",      # Doubao Pro β€” cost-control route
        "realtime":  "auto-fast",       # Qwen Turbo β€” fastest
        "code":      "auto-code",       # Claude Sonnet 4 β€” best at code
        "hard":      "auto-best",       # Claude Opus β€” highest quality
        "demo":      "auto-free",       # GLM-4 Flash β€” utility route
    }
    return client.chat.completions.create(
        model=route_map.get(task_type, "auto"),
        messages=messages,
    )

# Usage
bulk_classification  = smart_ask("batch", [...])   # routine route
code_refactor        = smart_ask("code",  [...])   # coding route
important_decision   = smart_ask("hard",  [...])   # premium route
live_chat_response   = smart_ask("realtime", [...]) # low-latency route

Route by workload, not guesswork.

One API key. 16 models. 6 routing modes. Usage logs and failover included.

Also: docs / analytics / in-depth blog