Run #637

success · fetched 2026-06-16 23:00:59 · 9.78 MB raw HTML · 539 models

Open this run in comparison view · JSON results

Top quality: Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) (67.5 pts)

#?Pareto?Model?Released?Cost$?$/Q?Qual?ΔTop?Intel?Code?Agent?Pen?Score?
1 Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)
Anthropic
2026-06-09 $6,228 92.29 67.5 0.0 59.9 62.0 80.6 37.9 67.5
2 Claude Opus 4.8 (Adaptive Reasoning, Max Effort)
Anthropic
2026-05-28 $3,736 58.93 63.4 -4.1 55.7 56.7 77.8 35.7 63.4
3 GPT-5.5 (xhigh)
OpenAI
2026-04-23 $2,865 45.70 62.7 -4.8 54.8 59.1 74.1 34.6 62.7
4 GPT-5.5 (high)
OpenAI
2026-04-23 $1,775 29.01 61.2 -6.3 53.1 58.5 72.0 32.5 61.2
5 Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic
2026-04-16 $3,738 63.23 59.1 -8.4 53.5 52.5 71.3 35.7 59.1
6 GPT-5.4 (xhigh)
OpenAI
2026-03-05 $2,357 40.05 58.9 -8.6 51.4 57.2 68.0 33.7 58.9
7 Gemini 3.5 Flash (high)
Google
2026-05-19 $1,071 19.41 55.2 -12.3 50.2 45.0 70.3 30.3 55.2
8 Qwen3.7 Max
Alibaba
2026-05-19 $1,643 30.30 54.2 -13.3 46.0 50.1 66.6 32.2 54.2
9 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic
2026-02-17 $3,356 62.47 53.7 -13.8 47.2 50.9 63.0 35.3 53.7
10 Gemini 3.1 Pro Preview
Google
2026-02-19 $829 15.44 53.7 -13.8 46.5 55.5 59.1 29.2 53.7
11 DeepSeek V4 Pro (Reasoning, Max Effort)
DeepSeek
2026-04-24 $180 3.39 53.0 -14.5 44.3 47.5 67.2 22.5 53.0
12 MiniMax-M3
MiniMax
2026-06-01 $260 4.98 52.2 -15.3 44.4 43.4 68.6 24.1 52.2
13 Kimi K2.6
Kimi
2026-04-20 $839 16.13 52.0 -15.5 42.8 47.1 66.0 29.2 52.0
14 MiMo-V2.5-Pro
Xiaomi
2026-04-22 $99.1 1.92 51.7 -15.7 42.2 45.5 67.4 20.0 51.7
15 GLM-5.1 (Reasoning)
Z AI
2026-04-07 $689 13.72 50.2 -17.3 40.2 43.4 67.1 28.4 50.2
16 Qwen3.7 Plus
Alibaba
2026-06-01 $149 2.96 50.2 -17.3 39.0 46.5 65.1 21.7 50.2
17 GPT-5.4 mini (xhigh)
OpenAI
2026-03-17 $1,158 23.10 50.1 -17.4 40.0 51.5 58.9 30.6 50.1
18 Kimi K2.7 Code
Kimi
2026-06-12 $530 10.64 49.8 -17.6 41.9 45.6 61.9 27.2 49.8
19 Grok 4.3 (high)
xAI
2026-04-30 $332 6.89 48.2 -19.3 37.6 41.0 65.9 25.2 48.2
20 Qwen3.6 Plus
Alibaba
2026-04-02 $534 11.12 48.0 -19.4 39.6 42.9 61.7 27.3 48.0
21 MiniMax-M2.7
MiniMax
2026-03-18 $144 3.05 47.2 -20.3 38.1 41.9 61.5 21.6 47.2
22 DeepSeek V4 Flash (Reasoning, Max Effort)
DeepSeek
2026-04-24 $89.9 1.92 46.8 -20.7 40.3 38.7 61.3 19.5 46.8
23 Qwen3.6 27B (Reasoning)
Alibaba
2026-04-22 $657 14.44 45.5 -22.0 37.1 36.5 62.9 28.2 45.5
24 Nemotron 3 Ultra 550B A55B (Reasoning)
NVIDIA
2026-06-04 $443 10.04 44.1 -23.4 37.8 37.6 57.1 26.5 44.1
25 Qwen3.5 397B A17B (Reasoning)
Alibaba
2026-02-16 $528 12.11 43.6 -23.9 33.7 41.3 55.8 27.2 43.6
26 GPT-5.4 nano (xhigh)
OpenAI
2026-03-17 $287 6.63 43.3 -24.2 38.2 43.9 47.6 24.6 43.3
27 Step 3.7 Flash
StepFun
2026-05-29 $320 7.61 42.1 -25.4 29.7 37.1 59.5 25.1 42.1
28 Qwen3.5 122B A10B (Reasoning)
Alibaba
2026-02-24 $446 11.14 40.0 -27.5 32.3 34.7 53.0 26.5 40.0
29 Mistral Medium 3.5
Mistral
2026-04-29 $1,478 37.42 39.5 -28.0 29.9 35.4 53.2 31.7 39.5
30 Ring-2.6-1T
InclusionAI
2026-05-08 $458 11.91 38.5 -29.0 30.6 33.3 51.5 26.6 38.5
31 Claude 4.5 Haiku (Reasoning)
Anthropic
2025-10-15 $539 15.80 34.1 -33.4 29.6 32.6 40.2 27.3 34.1
32 Nova 2.0 Pro Preview (medium)
Amazon
2025-11-27 $407 12.31 33.0 -34.4 21.8 30.4 47.0 26.1 33.0
33 Grok 4.3 (Non-reasoning)
xAI
2026-04-30 $344 10.46 32.9 -34.6 24.8 25.1 48.8 25.4 32.9
34 NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA
2026-03-11 $295 9.13 32.3 -35.2 25.4 31.2 40.2 24.7 32.3
35 gpt-oss-120b (high)
OpenAI
2025-08-05 $96.3 3.20 30.1 -37.4 23.8 28.6 37.9 19.8 30.1
36 Gemini 3.1 Flash-Lite
Google
2026-03-03 $94.8 3.52 26.9 -40.5 25.0 30.1 25.7 19.8 26.9
37 Gemma 4 26B A4B (Reasoning)
Google
2026-04-02 $54.5 2.04 26.8 -40.7 25.7 22.4 32.1 17.4 26.8
38 gpt-oss-20B (high)
OpenAI
2025-08-05 $29.9 1.47 20.3 -47.1 14.9 18.5 27.6 14.8 20.3