16 top AI models on standard benchmarks. MMLU, HumanEval, MATH-500, GPQA Diamond. All accessible via one AIPower API.
MMLU leader
🇺🇸 GPT-5.4
94.2%
general knowledge
Coding leader
🇺🇸 Claude Sonnet 4
93.7%
HumanEval
Math champion
🇨🇳 DeepSeek R1
97.3%
MATH-500
Reasoning king
🇨🇳 DeepSeek R1
71.5%
GPQA Diamond
| Model | MMLU | HumanEval | MATH-500 | GPQA | Context | $/M In | $/M Out |
|---|---|---|---|---|---|---|---|
🇺🇸 GPT-5.4 openai/gpt-5.4 | 94.2% | 91% | 94.5% | 58.2% | 272K | $3.75 | $22.50 |
🇺🇸 Claude Opus 4.6 anthropic/claude-opus | 92.8% | 93.4% | 91.2% | 68.4% | 200K | $7.50 | $37.50 |
🇺🇸 Claude Sonnet 4 anthropic/claude-sonnet | 90.1% | 93.7% | 87.3% | 62.1% | 200K | $4.50 | $22.50 |
🇺🇸 Gemini 2.5 Pro google/gemini-2.5-pro | 91.8% | 88.9% | 89.5% | 60.3% | 1M | $1.88 | $15.00 |
🇨🇳 DeepSeek R1 deepseek/deepseek-reasoner | 90.8% | 89.5% | 97.3% | 71.5% | 64K | $0.34 | $0.50 |
🇨🇳 DeepSeek V3 deepseek/deepseek-chat | 88.5% | 92.7% | 85.3% | 59.1% | 64K | $0.34 | $0.50 |
🇨🇳 GLM-5.1 zhipu/glm-5.1 | 87.3% | 92.1% | 82.8% | 54.8% | 128K | $1.20 | $3.84 |
🇨🇳 Qwen Plus qwen/qwen-plus | 89.2% | 86.1% | 79.8% | 56.3% | 128K | $0.13 | $1.87 |
🇨🇳 Kimi K2.5 moonshot/kimi-k2.5 | 85.7% | 89.5% | 80.1% | 52.4% | 256K | $0.24 | $1.20 |
🇺🇸 Gemini 2.5 Flash google/gemini-2.5-flash | 83.5% | 82.3% | 78.2% | 48.6% | 1M | $0.15 | $0.60 |
🇺🇸 GPT-4o Mini openai/gpt-4o-mini | 82.1% | 87.2% | 75.4% | 46.8% | 128K | $0.23 | $0.90 |
🇨🇳 Qwen Turbo qwen/qwen-turbo | 81.3% | 80.2% | 67.4% | 42.1% | 128K | $0.08 | $0.31 |
🇨🇳 MiniMax Text 01 minimax/minimax-text-01 | 80.5% | 78.3% | 72.6% | 45.2% | 1M | $0.36 | $1.44 |
🇨🇳 Doubao Pro doubao/doubao-pro-256k | 79.8% | 76.1% | 70.5% | 43.8% | 256K | $0.06 | $0.11 |
🇨🇳 Moonshot v1 8K moonshot/moonshot-v1-8k | 73.2% | 72.5% | 61.3% | 38.4% | 8K | $0.14 | $0.14 |
🇨🇳 GLM-4 Flash zhipu/glm-4-flash | 68.5% | 65.8% | 55.2% | 32.1% | 128K | $0.01 | $0.01 |
Sources: OpenAI, Anthropic, Google, DeepSeek, Alibaba, Zhipu public papers & model cards. Updated April 2026.
DeepSeek V3
92.7% HumanEval at $0.34/M. Beats GPT-4o Mini on everything, 1.5x cheaper.
DeepSeek R1
97.3% MATH-500, 71.5% GPQA at $0.34/M. SOTA reasoning at 91% lower cost than GPT-5.4.
GLM-4 Flash
$0.01/M — practically free. 68% MMLU still beats small open-source models. Perfect for classification.
One API. Benchmark them yourself. 50 free calls to start.