Build multi-agent systems that don't break when one model vendor goes down. 16 models through one OpenAI-compatible API β works with LangGraph, CrewAI, AutoGen, Mastra, OpenAI Swarm.
Start free β 2 calls, no cardStateful graphs
Role-based crews
Microsoft multi-agent
Lightweight handoffs
OpenAI's 2026 SDK
TypeScript agents
Agentic RAG
Any OpenAI SDK client
Any framework that speaks OpenAI SDK works. That's every modern agent framework in 2026.
Agent workflows make 50-500 LLM calls. Any provider downtime mid-run breaks state. Gateway failover keeps your agent alive through transient 5xx from upstream.
Researcher agent can run on cheap DeepSeek. Analyst needs Claude Opus. Writer uses Qwen. Routing per role cuts cost 70-90% vs using premium for everything.
Tag each request with the agent role. Query /api/usage/logs by agent to find cost-hogging agents. Fix them instead of upgrading the whole crew to expensive models.
Claude, Gemini, DeepSeek all have slightly different tool-use formats. We translate to OpenAI standard so your agent's tool-calling code never needs model-specific branches.
Point ChatOpenAI at AIPower. The graph, the tools, the state β everything else stays.
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
# Only change: base_url + AIPower key
llm = ChatOpenAI(
model="auto-code", # Claude Sonnet for agents
base_url="https://api.aipower.me/v1",
api_key="sk-aipower-xxx",
)
# Rest of your LangGraph code β unchanged
workflow = StateGraph(AgentState)
workflow.add_node("agent", lambda s: call_model(llm, s))
workflow.add_node("tools", ToolNode(tools))
workflow.add_edge("tools", "agent")
workflow.add_conditional_edges("agent", should_continue)
app = workflow.compile()Researcher uses cheap DeepSeek. Analyst uses Claude Opus. Writer uses Qwen. One API, three models.
from crewai import Agent, Crew, Task
from langchain_openai import ChatOpenAI
def llm(model):
return ChatOpenAI(
model=model,
base_url="https://api.aipower.me/v1",
api_key="sk-aipower-xxx",
)
researcher = Agent(
role="Senior researcher",
llm=llm("auto-cheap"), # DeepSeek V3 β cheap, fast for research
tools=[web_search_tool],
)
analyst = Agent(
role="Strategy analyst",
llm=llm("auto-best"), # Claude Opus β deep reasoning
)
writer = Agent(
role="Content writer",
llm=llm("qwen/qwen-plus"), # Qwen β strong writing, cheap
)
# Normal CrewAI orchestration
crew = Crew(agents=[researcher, analyst, writer], tasks=[...])
result = crew.kickoff()# AutoGen
from autogen import ConversableAgent
config_list = [{
"model": "auto-code",
"base_url": "https://api.aipower.me/v1",
"api_key": "sk-aipower-xxx",
}]
coder = ConversableAgent("coder", llm_config={"config_list": config_list})
# OpenAI Swarm / Agents SDK
from openai import OpenAI
client = OpenAI(
base_url="https://api.aipower.me/v1",
api_key="sk-aipower-xxx",
)
# All function calls, tool-use, handoffs work identicallyAgents depend on tool_choice and function-calling. We normalize Claude's native format, Gemini's format, and Chinese models' variations into OpenAI-standard tool calls.
| Model | Tool use | Parallel tools | JSON mode | Streaming |
|---|---|---|---|---|
| Claude Opus / Sonnet | β | β | β | β |
| GPT-5.4 / GPT-4o | β | β | β | β |
| Gemini 2.5 Pro / Flash | β | β | β | β |
| DeepSeek V3 / R1 | β | partial | β | β |
| Qwen Plus / Turbo | β | β | β | β |
| GLM / Kimi / Doubao / MiniMax | β | varies | β | β |
Tag each request with an agent ID. Query which agents are eating your budget. Optimize the top 20% that cost 80%.
# Tag requests with the agent role
res = client.chat.completions.create(
model="auto-code",
user="agent:researcher", # or "agent:analyst", "agent:writer"
messages=[...],
)
# Query cost by agent
# GET https://api.aipower.me/api/usage/logs?user=agent:researcher
# Returns CSV: tokens, cost, model, timestamp, agent_id
# Plug into your observability (Langfuse, Helicone, datadog)Scenario: Long-running CrewAI crew makes 200 LLM calls over 20 minutes. OpenAI throws a 529 at minute 12.
This is why production agent teams route through a gateway, not direct-to-vendor.
100 users/day Γ 50 agent LLM calls each = 5,000 calls/day. Mixed model routing.
| Approach | Cost / month | Notes |
|---|---|---|
| Direct Claude Opus (premium) | ~$4,500 | Best quality. No failover. |
| Direct OpenAI GPT-5.4 | ~$3,100 | Good baseline. Single vendor. |
| OpenRouter (15% markup) | ~$280 | No Chinese models. No per-agent tagging. |
| AIPower smart routing | ~$180 | 16 models Β· failover Β· per-agent tags |
Yes β Agents SDK speaks OpenAI's Responses API. AIPower provides /v1/responses as well, so handoffs, agent-to-agent routing, and guardrails work identically. Drop in base_url, everything else is stock.
Yes. Our gateway handles 200+ concurrent streaming sessions per key. For large crews (50+ concurrent agents), shard across multiple keys to isolate rate-limit blast radius.
Transparent re-route at the gateway. The graph sees a normal OpenAI response (just from a different underlying model). State in your graph is unaffected. You may see 200-500ms latency bump on failover.
Yes. Use the explicit provider prefix: `anthropic/claude-opus-4-6` pins to Anthropic. No auto-routing happens. You sacrifice failover but gain determinism. Most teams use `auto-*` aliases for 90% of agents and pin only 1-2 critical ones.
Standardized. Set tool_choice="auto" | "required" | {"name": "search_tool"}. We translate into each provider's native format. Parallel tool calls (multiple tools in one response) work on Claude, GPT, Gemini, Qwen. Partial on DeepSeek β use sequential if your agent relies on parallel.
Yes. Streamed tool_calls arrive incrementally, same chunk format as OpenAI's streaming_tool_calls. LangGraph/CrewAI/AutoGen all handle it correctly since they use OpenAI SDK under the hood.
Yes. Pass user="user_id:agent_role" in each call. Query usage logs by user prefix to bill your end-customers. Our /api/usage/logs returns per-request metadata (tokens in/out, cost, model, timestamp, user tag).
2 free trial calls. +100 bonus on first $5 top-up. Works with your current framework β just change base_url.