Guide
AI API Monitoring Dashboard: Track Usage, Costs, and Performance
April 17, 2026 · 7 min read
You're running AI in production. How much did you spend last week? Which model is slowest? What's your error rate? If you can't answer these instantly, you're flying blind. This guide covers building a monitoring layer for your AI API usage — from simple logging to full dashboards.
What to Monitor
| Metric | Why It Matters | Alert Threshold |
|---|---|---|
| Total spend ($/day) | Budget control | > 2x daily average |
| Requests per minute | Rate limit proximity | > 80% of limit |
| Error rate (%) | Service reliability | > 5% |
| P95 latency (ms) | User experience | > 5000ms |
| Tokens per request | Cost efficiency | > 2x expected |
| Model distribution | Routing validation | Fallback > 20% |
Step 1: Request Logger Middleware
import time
import json
from datetime import datetime
from openai import OpenAI
# Pricing per million tokens (input/output)
MODEL_PRICING = {
"deepseek/deepseek-chat": (0.34, 0.50),
"anthropic/claude-sonnet": (4.50, 22.50),
"openai/gpt-5.4": (3.75, 22.50),
"zhipu/glm-4-flash": (0.01, 0.01),
}
class MonitoredClient:
def __init__(self, api_key):
self.client = OpenAI(base_url="https://api.aipower.me/v1", api_key=api_key)
self.logs = []
def chat(self, model, messages, **kwargs):
start = time.time()
error = None
try:
response = self.client.chat.completions.create(
model=model, messages=messages, **kwargs
)
return response
except Exception as e:
error = str(e)
raise
finally:
latency = (time.time() - start) * 1000
usage = response.usage if not error else None
input_cost = 0
output_cost = 0
if usage and model in MODEL_PRICING:
ip, op = MODEL_PRICING[model]
input_cost = usage.prompt_tokens * ip / 1_000_000
output_cost = usage.completion_tokens * op / 1_000_000
self.logs.append({
"timestamp": datetime.now().isoformat(),
"model": model,
"latency_ms": round(latency),
"input_tokens": usage.prompt_tokens if usage else 0,
"output_tokens": usage.completion_tokens if usage else 0,
"cost_usd": round(input_cost + output_cost, 6),
"error": error,
})Step 2: Dashboard Metrics
from collections import defaultdict
def generate_report(logs, period="today"):
"""Generate a monitoring report from logs."""
metrics = {
"total_requests": len(logs),
"total_cost": sum(l["cost_usd"] for l in logs),
"total_errors": sum(1 for l in logs if l["error"]),
"error_rate": sum(1 for l in logs if l["error"]) / max(len(logs), 1) * 100,
"avg_latency": sum(l["latency_ms"] for l in logs) / max(len(logs), 1),
}
# Per-model breakdown
by_model = defaultdict(lambda: {"requests": 0, "cost": 0, "tokens": 0})
for log in logs:
m = by_model[log["model"]]
m["requests"] += 1
m["cost"] += log["cost_usd"]
m["tokens"] += log["input_tokens"] + log["output_tokens"]
# Latency percentiles
latencies = sorted(l["latency_ms"] for l in logs)
if latencies:
metrics["p50_latency"] = latencies[len(latencies) // 2]
metrics["p95_latency"] = latencies[int(len(latencies) * 0.95)]
metrics["p99_latency"] = latencies[int(len(latencies) * 0.99)]
return {"summary": metrics, "by_model": dict(by_model)}Step 3: Alerting
def check_alerts(metrics):
"""Check metrics against thresholds and return alerts."""
alerts = []
if metrics["error_rate"] > 5:
alerts.append(f"HIGH ERROR RATE: {metrics['error_rate']:.1f}%")
if metrics.get("p95_latency", 0) > 5000:
alerts.append(f"HIGH LATENCY: P95 = {metrics['p95_latency']}ms")
if metrics["total_cost"] > 50: # Daily budget
alerts.append(f"BUDGET ALERT: $" + f"{metrics['total_cost']:.2f} today")
return alertsAIPower Built-in Monitoring
AIPower provides a monitoring dashboard out of the box — no code needed:
- Real-time spend tracking — see costs as they accrue, not hours later
- Per-model breakdown — which models cost the most
- Request history — searchable log of all API calls
- Usage graphs — daily/weekly/monthly trends
- Budget alerts — get notified before you overspend
Monitoring Best Practices
- Log every request — you can't optimize what you don't measure
- Set daily budget alerts — catch runaway costs early
- Track latency percentiles — average latency hides tail latency problems
- Monitor model distribution — unexpected fallback patterns indicate provider issues
- Review weekly — identify trends before they become problems
Monitor your AI spend in real time at aipower.me — built-in dashboard tracks every request, every token, every dollar. Start with 50 free API calls.