For Agent Developers

The LLM layer under your agent

Build multi-agent systems that don't break when one model vendor goes down. 16 models through one OpenAI-compatible API β€” works with LangGraph, CrewAI, AutoGen, Mastra, OpenAI Swarm.

Start free β€” 2 calls, no card

Works with every agent framework

πŸ•ΈοΈ

LangGraph

Stateful graphs

πŸ‘₯

CrewAI

Role-based crews

πŸ”„

AutoGen

Microsoft multi-agent

🐝

OpenAI Swarm

Lightweight handoffs

🎯

Agents SDK

OpenAI's 2026 SDK

βš™οΈ

Mastra

TypeScript agents

πŸ—οΈ

LlamaIndex

Agentic RAG

πŸ”Œ

Custom

Any OpenAI SDK client

Any framework that speaks OpenAI SDK works. That's every modern agent framework in 2026.

Why agent developers choose a gateway

πŸ›‘οΈ

Long-running resilience

Agent workflows make 50-500 LLM calls. Any provider downtime mid-run breaks state. Gateway failover keeps your agent alive through transient 5xx from upstream.

🎚️

Model-per-role routing

Researcher agent can run on cheap DeepSeek. Analyst needs Claude Opus. Writer uses Qwen. Routing per role cuts cost 70-90% vs using premium for everything.

πŸ“Š

Per-agent cost attribution

Tag each request with the agent role. Query /api/usage/logs by agent to find cost-hogging agents. Fix them instead of upgrading the whole crew to expensive models.

πŸ”§

Tool-use normalization

Claude, Gemini, DeepSeek all have slightly different tool-use formats. We translate to OpenAI standard so your agent's tool-calling code never needs model-specific branches.

LangGraph: one line to swap model

Point ChatOpenAI at AIPower. The graph, the tools, the state β€” everything else stays.

from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode

# Only change: base_url + AIPower key
llm = ChatOpenAI(
    model="auto-code",              # Claude Sonnet for agents
    base_url="https://api.aipower.me/v1",
    api_key="sk-aipower-xxx",
)

# Rest of your LangGraph code β€” unchanged
workflow = StateGraph(AgentState)
workflow.add_node("agent", lambda s: call_model(llm, s))
workflow.add_node("tools", ToolNode(tools))
workflow.add_edge("tools", "agent")
workflow.add_conditional_edges("agent", should_continue)
app = workflow.compile()

CrewAI: different models per agent role

Researcher uses cheap DeepSeek. Analyst uses Claude Opus. Writer uses Qwen. One API, three models.

from crewai import Agent, Crew, Task
from langchain_openai import ChatOpenAI

def llm(model):
    return ChatOpenAI(
        model=model,
        base_url="https://api.aipower.me/v1",
        api_key="sk-aipower-xxx",
    )

researcher = Agent(
    role="Senior researcher",
    llm=llm("auto-cheap"),           # DeepSeek V3 β€” cheap, fast for research
    tools=[web_search_tool],
)

analyst = Agent(
    role="Strategy analyst",
    llm=llm("auto-best"),            # Claude Opus β€” deep reasoning
)

writer = Agent(
    role="Content writer",
    llm=llm("qwen/qwen-plus"),       # Qwen β€” strong writing, cheap
)

# Normal CrewAI orchestration
crew = Crew(agents=[researcher, analyst, writer], tasks=[...])
result = crew.kickoff()

AutoGen / OpenAI Swarm: config-only switch

# AutoGen
from autogen import ConversableAgent

config_list = [{
    "model": "auto-code",
    "base_url": "https://api.aipower.me/v1",
    "api_key": "sk-aipower-xxx",
}]

coder = ConversableAgent("coder", llm_config={"config_list": config_list})

# OpenAI Swarm / Agents SDK
from openai import OpenAI
client = OpenAI(
    base_url="https://api.aipower.me/v1",
    api_key="sk-aipower-xxx",
)
# All function calls, tool-use, handoffs work identically

Tool use works across all providers

Agents depend on tool_choice and function-calling. We normalize Claude's native format, Gemini's format, and Chinese models' variations into OpenAI-standard tool calls.

ModelTool useParallel toolsJSON modeStreaming
Claude Opus / Sonnetβœ“βœ“βœ“βœ“
GPT-5.4 / GPT-4oβœ“βœ“βœ“βœ“
Gemini 2.5 Pro / Flashβœ“βœ“βœ“βœ“
DeepSeek V3 / R1βœ“partialβœ“βœ“
Qwen Plus / Turboβœ“βœ“βœ“βœ“
GLM / Kimi / Doubao / MiniMaxβœ“variesβœ“βœ“

Per-agent cost attribution

Tag each request with an agent ID. Query which agents are eating your budget. Optimize the top 20% that cost 80%.

# Tag requests with the agent role
res = client.chat.completions.create(
    model="auto-code",
    user="agent:researcher",          # or "agent:analyst", "agent:writer"
    messages=[...],
)

# Query cost by agent
# GET https://api.aipower.me/api/usage/logs?user=agent:researcher
# Returns CSV: tokens, cost, model, timestamp, agent_id
# Plug into your observability (Langfuse, Helicone, datadog)

When OpenAI goes down, your agent doesn't

Scenario: Long-running CrewAI crew makes 200 LLM calls over 20 minutes. OpenAI throws a 529 at minute 12.

βœ—
Direct OpenAI: Your crew fails mid-execution. You lose 12 minutes of work. State is inconsistent. Users see error.
βœ“
AIPower: Failing request transparently re-routed to Claude or DeepSeek. Same OpenAI-compatible response shape. Your crew keeps running. State stays consistent.

This is why production agent teams route through a gateway, not direct-to-vendor.

What a multi-agent SaaS pays

100 users/day Γ— 50 agent LLM calls each = 5,000 calls/day. Mixed model routing.

ApproachCost / monthNotes
Direct Claude Opus (premium)~$4,500Best quality. No failover.
Direct OpenAI GPT-5.4~$3,100Good baseline. Single vendor.
OpenRouter (15% markup)~$280No Chinese models. No per-agent tagging.
AIPower smart routing~$18016 models Β· failover Β· per-agent tags

Agent-specific FAQ

Do you support OpenAI's Agents SDK (handoffs, sessions, guardrails)?

Yes β€” Agents SDK speaks OpenAI's Responses API. AIPower provides /v1/responses as well, so handoffs, agent-to-agent routing, and guardrails work identically. Drop in base_url, everything else is stock.

Can I run parallel agents at scale?

Yes. Our gateway handles 200+ concurrent streaming sessions per key. For large crews (50+ concurrent agents), shard across multiple keys to isolate rate-limit blast radius.

What happens during a LangGraph failover mid-turn?

Transparent re-route at the gateway. The graph sees a normal OpenAI response (just from a different underlying model). State in your graph is unaffected. You may see 200-500ms latency bump on failover.

Can I pin a specific provider for a critical agent?

Yes. Use the explicit provider prefix: `anthropic/claude-opus-4-6` pins to Anthropic. No auto-routing happens. You sacrifice failover but gain determinism. Most teams use `auto-*` aliases for 90% of agents and pin only 1-2 critical ones.

How does tool_choice work across vendors?

Standardized. Set tool_choice="auto" | "required" | {"name": "search_tool"}. We translate into each provider's native format. Parallel tool calls (multiple tools in one response) work on Claude, GPT, Gemini, Qwen. Partial on DeepSeek β€” use sequential if your agent relies on parallel.

Does streaming work with tool calls?

Yes. Streamed tool_calls arrive incrementally, same chunk format as OpenAI's streaming_tool_calls. LangGraph/CrewAI/AutoGen all handle it correctly since they use OpenAI SDK under the hood.

Can I track token usage per agent for billing users?

Yes. Pass user="user_id:agent_role" in each call. Query usage logs by user prefix to bill your end-customers. Our /api/usage/logs returns per-request metadata (tokens in/out, cost, model, timestamp, user tag).

Building a different kind of AI product?

Your next agent ships this week.

2 free trial calls. +100 bonus on first $5 top-up. Works with your current framework β€” just change base_url.