Run #663
success · fetched 2026-06-19 01:00:55 · 9.81 MB raw HTML · 540 models
Top quality: Claude Opus 4.8 (Adaptive Reasoning, Max Effort) (63.4 pts)
| #? | Pareto? | Model? | Released? | Cost$? | $/Q? | Qual? | ΔTop? | Intel? | Code? | Agent? | Pen? | Score? |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ✓ | Claude Opus 4.8 (Adaptive Reasoning, Max Effort) Anthropic |
2026-05-28 | $4,012 | 63.27 | 63.4 | 0.0 | 55.7 | 56.7 | 77.8 | 36.0 | 63.4 |
| 2 | Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) Anthropic |
2026-06-09 | $6,228 | 98.77 | 63.1 | -0.3 | 59.9 | 76.5 | 52.8 | 37.9 | 63.1 | |
| 3 | ✓ | GPT-5.5 (high) OpenAI |
2026-04-23 | $1,775 | 29.01 | 61.2 | -2.2 | 53.1 | 58.5 | 72.0 | 32.5 | 61.2 |
| 4 | Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Anthropic |
2026-04-16 | $3,738 | 63.23 | 59.1 | -4.3 | 53.5 | 52.5 | 71.3 | 35.7 | 59.1 | |
| 5 | GPT-5.4 (xhigh) OpenAI |
2026-03-05 | $2,261 | 38.41 | 58.9 | -4.5 | 51.4 | 57.2 | 68.0 | 33.5 | 58.9 | |
| 6 | GPT-5.5 (xhigh) OpenAI |
2026-04-23 | $2,588 | 44.47 | 58.2 | -5.2 | 54.8 | 74.9 | 44.9 | 34.1 | 58.2 | |
| 7 | ✓ | Gemini 3.5 Flash (high) |
2026-05-19 | $1,142 | 20.70 | 55.2 | -8.2 | 50.2 | 45.0 | 70.3 | 30.6 | 55.2 |
| 8 | ✓ | GLM-5.2 (max) Z AI |
2026-06-16 | $921 | 16.96 | 54.3 | -9.1 | 51.1 | 68.8 | 43.1 | 29.6 | 54.3 |
| 9 | Qwen3.7 Max Alibaba |
2026-05-19 | $1,159 | 21.37 | 54.2 | -9.2 | 46.0 | 50.1 | 66.6 | 30.6 | 54.2 | |
| 10 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) Anthropic |
2026-02-17 | $3,356 | 62.47 | 53.7 | -9.7 | 47.2 | 50.9 | 63.0 | 35.3 | 53.7 | |
| 11 | ✓ | DeepSeek V4 Pro (Reasoning, Max Effort) DeepSeek |
2026-04-24 | $180 | 3.39 | 53.0 | -10.4 | 44.3 | 47.5 | 67.2 | 22.5 | 53.0 |
| 12 | MiniMax-M3 MiniMax |
2026-06-01 | $235 | 4.51 | 52.2 | -11.2 | 44.4 | 43.4 | 68.6 | 23.7 | 52.2 | |
| 13 | Kimi K2.6 Kimi |
2026-04-20 | $839 | 16.13 | 52.0 | -11.4 | 42.8 | 47.1 | 66.0 | 29.2 | 52.0 | |
| 14 | ✓ | MiMo-V2.5-Pro Xiaomi |
2026-04-22 | $99.1 | 1.92 | 51.7 | -11.7 | 42.2 | 45.5 | 67.4 | 20.0 | 51.7 |
| 15 | GLM-5.1 (Reasoning) Z AI |
2026-04-07 | $674 | 13.43 | 50.2 | -13.2 | 40.2 | 43.4 | 67.1 | 28.3 | 50.2 | |
| 16 | Qwen3.7 Plus Alibaba |
2026-06-01 | $152 | 3.03 | 50.2 | -13.2 | 39.0 | 46.5 | 65.1 | 21.8 | 50.2 | |
| 17 | GPT-5.4 mini (xhigh) OpenAI |
2026-03-17 | $1,158 | 23.10 | 50.1 | -13.3 | 40.0 | 51.5 | 58.9 | 30.6 | 50.1 | |
| 18 | Kimi K2.7 Code Kimi |
2026-06-12 | $530 | 10.64 | 49.8 | -13.6 | 41.9 | 45.6 | 61.9 | 27.2 | 49.8 | |
| 19 | Qwen3.6 Plus Alibaba |
2026-04-02 | $484 | 10.08 | 48.0 | -15.4 | 39.6 | 42.9 | 61.7 | 26.9 | 48.0 | |
| 20 | MiniMax-M2.7 MiniMax |
2026-03-18 | $144 | 3.05 | 47.2 | -16.2 | 38.1 | 41.9 | 61.5 | 21.6 | 47.2 | |
| 21 | ✓ | DeepSeek V4 Flash (Reasoning, Max Effort) DeepSeek |
2026-04-24 | $78.4 | 1.68 | 46.8 | -16.6 | 40.3 | 38.7 | 61.3 | 18.9 | 46.8 |
| 22 | Gemini 3.1 Pro Preview |
2026-02-19 | $860 | 18.87 | 45.6 | -17.8 | 46.5 | 68.8 | 21.4 | 29.3 | 45.6 | |
| 23 | Qwen3.6 27B (Reasoning) Alibaba |
2026-04-22 | $668 | 14.69 | 45.5 | -17.9 | 37.1 | 36.5 | 62.9 | 28.2 | 45.5 | |
| 24 | Nemotron 3 Ultra 550B A55B (Reasoning) NVIDIA |
2026-06-04 | $444 | 10.05 | 44.1 | -19.3 | 37.8 | 37.6 | 57.1 | 26.5 | 44.1 | |
| 25 | Qwen3.5 397B A17B (Reasoning) Alibaba |
2026-02-16 | $528 | 12.11 | 43.6 | -19.8 | 33.7 | 41.3 | 55.8 | 27.2 | 43.6 | |
| 26 | GPT-5.4 nano (xhigh) OpenAI |
2026-03-17 | $289 | 6.67 | 43.3 | -20.2 | 38.2 | 43.9 | 47.6 | 24.6 | 43.3 | |
| 27 | Step 3.7 Flash StepFun |
2026-05-29 | $320 | 7.61 | 42.1 | -21.3 | 29.7 | 37.1 | 59.5 | 25.1 | 42.1 | |
| 28 | Qwen3.6 35B A3B (Reasoning) Alibaba |
2026-04-16 | $333 | 7.98 | 41.7 | -21.7 | 31.6 | 35.2 | 58.3 | 25.2 | 41.7 | |
| 29 | Qwen3.5 122B A10B (Reasoning) Alibaba |
2026-02-24 | $447 | 11.18 | 40.0 | -23.4 | 32.3 | 34.7 | 53.0 | 26.5 | 40.0 | |
| 30 | Mistral Medium 3.5 Mistral |
2026-04-29 | $1,014 | 25.67 | 39.5 | -23.9 | 29.9 | 35.4 | 53.2 | 30.1 | 39.5 | |
| 31 | Ring-2.6-1T InclusionAI |
2026-05-08 | $459 | 11.94 | 38.5 | -24.9 | 30.6 | 33.3 | 51.5 | 26.6 | 38.5 | |
| 32 | Grok 4.3 (high) xAI |
2026-04-30 | $319 | 9.20 | 34.6 | -28.8 | 37.6 | 42.2 | 24.1 | 25.0 | 34.6 | |
| 33 | Claude 4.5 Haiku (Reasoning) Anthropic |
2025-10-15 | $539 | 15.80 | 34.1 | -29.3 | 29.6 | 32.6 | 40.2 | 27.3 | 34.1 | |
| 34 | Nova 2.0 Pro Preview (medium) Amazon |
2025-11-27 | $407 | 12.31 | 33.0 | -30.4 | 21.8 | 30.4 | 47.0 | 26.1 | 33.0 | |
| 35 | Grok 4.3 (Non-reasoning) xAI |
2026-04-30 | $297 | 9.04 | 32.9 | -30.5 | 24.8 | 25.1 | 48.8 | 24.7 | 32.9 | |
| 36 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) NVIDIA |
2026-03-11 | $287 | 8.89 | 32.3 | -31.1 | 25.4 | 31.2 | 40.2 | 24.6 | 32.3 | |
| 37 | gpt-oss-120b (high) OpenAI |
2025-08-05 | $96.3 | 3.20 | 30.1 | -33.3 | 23.8 | 28.6 | 37.9 | 19.8 | 30.1 | |
| 38 | Gemini 3.1 Flash-Lite |
2026-03-03 | $94.8 | 3.52 | 26.9 | -36.5 | 25.0 | 30.1 | 25.7 | 19.8 | 26.9 | |
| 39 | ✓ | Gemma 4 26B A4B (Reasoning) |
2026-04-02 | $54.5 | 2.04 | 26.8 | -36.6 | 25.7 | 22.4 | 32.1 | 17.4 | 26.8 |
| 40 | ✓ | gpt-oss-20B (high) OpenAI |
2025-08-05 | $29.9 | 1.47 | 20.3 | -43.1 | 14.9 | 18.5 | 27.6 | 14.8 | 20.3 |