WhatsApp, Telegram, Discord, Slack, or embedded web chat — stream responses from 16 AI models through one OpenAI-compatible endpoint. Per-user caps, auto-failover, zero prompt retention.
Start free — 2 calls, no cardVia Meta Cloud API webhook
Via Bot API — 5 min setup
Via discord.js / interactions
Events API + Bolt SDK
Any frontend, streaming SSE
Mobile SDKs call the API directly
Pipe STT → AIPower → TTS
Any HTTP webhook works
Any platform that supports HTTP webhooks works — the AIPower API is the LLM layer, not the messaging layer.
Stream tokens to the user as they arrive. Works identically across all 16 models.
// Node.js — works inside your webhook handler (WhatsApp / Telegram / etc.)
import OpenAI from "openai";
const aipower = new OpenAI({
baseURL: "https://api.aipower.me/v1",
apiKey: process.env.AIPOWER_API_KEY,
});
async function handleUserMessage(userId: string, text: string) {
const stream = await aipower.chat.completions.create({
model: "auto", // DeepSeek V3 by default — 91% cheaper than GPT-5.4
stream: true,
user: userId, // Tag requests per-user for billing + analytics
messages: [
{ role: "system", content: "You are a friendly assistant." },
...conversationHistory[userId], // Your session store
{ role: "user", content: text },
],
});
let full = "";
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content || "";
full += delta;
await sendToPlatform(userId, delta); // WhatsApp / Telegram / etc.
}
conversationHistory[userId].push(
{ role: "user", content: text },
{ role: "assistant", content: full },
);
}| Chatbot type | Model | Cost/M | Why |
|---|---|---|---|
| General Q&A / support | auto → DeepSeek V3 | $0.34 | Cheap, fast, knows English + Chinese |
| High-quality assistant | auto-best → Claude Opus | $30 | Best reasoning & writing |
| Real-time typing (<500ms) | auto-fast → Qwen Turbo | $0.12 | Lowest first-token latency |
| Coding bot (cursor-style) | auto-code → Claude Sonnet | $3.45 | 78% SWE-bench |
| WeChat / Chinese-market | qwen/qwen-plus | $0.13 | Best CN, cheaper than GPT |
| Free-tier / demo bot | zhipu/glm-4-flash | $0.01 | Nearly free |
Tag each request with a user ID. Set daily spending caps. Bail when users hit their quota.
// Option 1: Account-wide cap
// Set daily_cap_cents=500 in your dashboard → auto-halt at $5/day
// Option 2: Per-user cap (your app-side)
const userDailySpend = await db.getSpendToday(userId);
if (userDailySpend > 50) { // 50 cents = ~150k free-tier tokens
return "You've hit your daily limit. Upgrade to continue.";
}
// Option 3: Per-user analytics
const res = await aipower.chat.completions.create({
model: "auto",
user: userId, // Tag it — query /api/usage/logs?user=...
messages: [...],
});When OpenAI 5xx's or Anthropic rate-limits, requests transparently re-route to a backup provider. Your bot keeps responding.
Gateway runs on Cloudflare Workers — 99.99% uptime, <50ms global latency to the gateway.
We don't store chat content. Only billing metadata (user tag, tokens, model). Run your bot in regulated industries.
Yes. Every model supports `stream: true`. The response is OpenAI-compatible — any library that works with OpenAI streaming works here.
Yes. We've had users push 200+ concurrent streaming sessions through a single API key without issue. For 10k+ users, shard across multiple keys for rate-limit isolation.
Use system prompts with clear constraints, and use a router: start with cheaper models (DeepSeek) for common questions, escalate to Claude Opus only when confidence is low. We have a `/routing` page showing the pattern.
AIPower doesn't store it — you do. Pass the full message array on each request. We recommend Redis/DynamoDB for session state; the gateway is stateless.
Route them to Chinese models (`qwen-plus`, `deepseek-chat`, `kimi`) — faster in-region, and they'll handle Chinese input better than GPT. We serve both CN and global traffic from the same API.
Treat user input as untrusted content. Don't let users override your system prompt by embedding `<|system|>` markers. Our docs have an input-sanitization pattern. Use structured outputs (JSON schema) where possible to limit response shapes.
2 free trial calls. +100 bonus on first $5 top-up. OpenAI SDK drop-in — no rewrite.