Guide

AI API Monitoring Dashboard: Track Usage, Costs, and Performance

April 17, 2026 · 7 min read

You're running AI in production. How much did you spend last week? Which model is slowest? What's your error rate? If you can't answer these instantly, you're flying blind. This guide covers building a monitoring layer for your AI API usage — from simple logging to full dashboards.

What to Monitor

MetricWhy It MattersAlert Threshold
Total spend ($/day)Budget control> 2x daily average
Requests per minuteRate limit proximity> 80% of limit
Error rate (%)Service reliability> 5%
P95 latency (ms)User experience> 5000ms
Tokens per requestCost efficiency> 2x expected
Model distributionRouting validationFallback > 20%

Step 1: Request Logger Middleware

import time
import json
from datetime import datetime
from openai import OpenAI

# Pricing per million tokens (input/output)
MODEL_PRICING = {
    "deepseek/deepseek-chat": (0.34, 0.50),
    "anthropic/claude-sonnet": (4.50, 22.50),
    "openai/gpt-5.4": (3.75, 22.50),
    "zhipu/glm-4-flash": (0.01, 0.01),
}

class MonitoredClient:
    def __init__(self, api_key):
        self.client = OpenAI(base_url="https://api.aipower.me/v1", api_key=api_key)
        self.logs = []

    def chat(self, model, messages, **kwargs):
        start = time.time()
        error = None

        try:
            response = self.client.chat.completions.create(
                model=model, messages=messages, **kwargs
            )
            return response
        except Exception as e:
            error = str(e)
            raise
        finally:
            latency = (time.time() - start) * 1000
            usage = response.usage if not error else None

            input_cost = 0
            output_cost = 0
            if usage and model in MODEL_PRICING:
                ip, op = MODEL_PRICING[model]
                input_cost = usage.prompt_tokens * ip / 1_000_000
                output_cost = usage.completion_tokens * op / 1_000_000

            self.logs.append({
                "timestamp": datetime.now().isoformat(),
                "model": model,
                "latency_ms": round(latency),
                "input_tokens": usage.prompt_tokens if usage else 0,
                "output_tokens": usage.completion_tokens if usage else 0,
                "cost_usd": round(input_cost + output_cost, 6),
                "error": error,
            })

Step 2: Dashboard Metrics

from collections import defaultdict

def generate_report(logs, period="today"):
    """Generate a monitoring report from logs."""
    metrics = {
        "total_requests": len(logs),
        "total_cost": sum(l["cost_usd"] for l in logs),
        "total_errors": sum(1 for l in logs if l["error"]),
        "error_rate": sum(1 for l in logs if l["error"]) / max(len(logs), 1) * 100,
        "avg_latency": sum(l["latency_ms"] for l in logs) / max(len(logs), 1),
    }

    # Per-model breakdown
    by_model = defaultdict(lambda: {"requests": 0, "cost": 0, "tokens": 0})
    for log in logs:
        m = by_model[log["model"]]
        m["requests"] += 1
        m["cost"] += log["cost_usd"]
        m["tokens"] += log["input_tokens"] + log["output_tokens"]

    # Latency percentiles
    latencies = sorted(l["latency_ms"] for l in logs)
    if latencies:
        metrics["p50_latency"] = latencies[len(latencies) // 2]
        metrics["p95_latency"] = latencies[int(len(latencies) * 0.95)]
        metrics["p99_latency"] = latencies[int(len(latencies) * 0.99)]

    return {"summary": metrics, "by_model": dict(by_model)}

Step 3: Alerting

def check_alerts(metrics):
    """Check metrics against thresholds and return alerts."""
    alerts = []

    if metrics["error_rate"] > 5:
        alerts.append(f"HIGH ERROR RATE: {metrics['error_rate']:.1f}%")

    if metrics.get("p95_latency", 0) > 5000:
        alerts.append(f"HIGH LATENCY: P95 = {metrics['p95_latency']}ms")

    if metrics["total_cost"] > 50:  # Daily budget
        alerts.append(f"BUDGET ALERT: $" + f"{metrics['total_cost']:.2f} today")

    return alerts

AIPower Built-in Monitoring

AIPower provides a monitoring dashboard out of the box — no code needed:

  • Real-time spend tracking — see costs as they accrue, not hours later
  • Per-model breakdown — which models cost the most
  • Request history — searchable log of all API calls
  • Usage graphs — daily/weekly/monthly trends
  • Budget alerts — get notified before you overspend

Monitoring Best Practices

  1. Log every request — you can't optimize what you don't measure
  2. Set daily budget alerts — catch runaway costs early
  3. Track latency percentiles — average latency hides tail latency problems
  4. Monitor model distribution — unexpected fallback patterns indicate provider issues
  5. Review weekly — identify trends before they become problems

Monitor your AI spend in real time at aipower.me — built-in dashboard tracks every request, every token, every dollar. Start with 50 free API calls.

Ready to try?

50 free API calls. 16 models. One API key.

Create free account