Run #655

success · fetched 2026-06-18 12:00:57 · 9.40 MB raw HTML · 540 models

Open this run in comparison view · JSON results

Top quality: Claude Opus 4.8 (Adaptive Reasoning, Max Effort) (63.4 pts)

#?Pareto?Model?Released?Cost$?$/Q?Qual?ΔTop?Intel?Code?Agent?Pen?Score?
1 Claude Opus 4.8 (Adaptive Reasoning, Max Effort)
Anthropic
2026-05-28 $4,012 63.27 63.4 0.0 55.7 56.7 77.8 36.0 63.4
2 Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)
Anthropic
2026-06-09 $6,228 98.77 63.1 -0.3 59.9 76.5 52.8 37.9 63.1
3 GPT-5.5 (high)
OpenAI
2026-04-23 $1,775 29.01 61.2 -2.2 53.1 58.5 72.0 32.5 61.2
4 Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic
2026-04-16 $3,738 63.23 59.1 -4.3 53.5 52.5 71.3 35.7 59.1
5 GPT-5.4 (xhigh)
OpenAI
2026-03-05 $2,261 38.41 58.9 -4.5 51.4 57.2 68.0 33.5 58.9
6 GPT-5.5 (xhigh)
OpenAI
2026-04-23 $2,588 44.47 58.2 -5.2 54.8 74.9 44.9 34.1 58.2
7 Gemini 3.5 Flash (high)
Google
2026-05-19 $1,142 20.70 55.2 -8.2 50.2 45.0 70.3 30.6 55.2
8 Qwen3.7 Max
Alibaba
2026-05-19 $1,432 26.42 54.2 -9.2 46.0 50.1 66.6 31.6 54.2
9 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic
2026-02-17 $3,356 62.47 53.7 -9.7 47.2 50.9 63.0 35.3 53.7
10 GLM-5.2 (max)
Z AI
2026-06-16 $920 17.18 53.6 -9.8 50.7 67.0 43.1 29.6 53.6
11 DeepSeek V4 Pro (Reasoning, Max Effort)
DeepSeek
2026-04-24 $180 3.39 53.0 -10.4 44.3 47.5 67.2 22.5 53.0
12 MiniMax-M3
MiniMax
2026-06-01 $235 4.51 52.2 -11.2 44.4 43.4 68.6 23.7 52.2
13 Kimi K2.6
Kimi
2026-04-20 $839 16.13 52.0 -11.4 42.8 47.1 66.0 29.2 52.0
14 MiMo-V2.5-Pro
Xiaomi
2026-04-22 $99.1 1.92 51.7 -11.7 42.2 45.5 67.4 20.0 51.7
15 GLM-5.1 (Reasoning)
Z AI
2026-04-07 $674 13.43 50.2 -13.2 40.2 43.4 67.1 28.3 50.2
16 Qwen3.7 Plus
Alibaba
2026-06-01 $152 3.03 50.2 -13.2 39.0 46.5 65.1 21.8 50.2
17 GPT-5.4 mini (xhigh)
OpenAI
2026-03-17 $1,158 23.10 50.1 -13.3 40.0 51.5 58.9 30.6 50.1
18 Kimi K2.7 Code
Kimi
2026-06-12 $530 10.64 49.8 -13.6 41.9 45.6 61.9 27.2 49.8
19 Qwen3.6 Plus
Alibaba
2026-04-02 $484 10.08 48.0 -15.4 39.6 42.9 61.7 26.9 48.0
20 MiniMax-M2.7
MiniMax
2026-03-18 $144 3.05 47.2 -16.2 38.1 41.9 61.5 21.6 47.2
21 DeepSeek V4 Flash (Reasoning, Max Effort)
DeepSeek
2026-04-24 $78.4 1.68 46.8 -16.6 40.3 38.7 61.3 18.9 46.8
22 Gemini 3.1 Pro Preview
Google
2026-02-19 $860 18.87 45.6 -17.8 46.5 68.8 21.4 29.3 45.6
23 Qwen3.6 27B (Reasoning)
Alibaba
2026-04-22 $668 14.69 45.5 -17.9 37.1 36.5 62.9 28.2 45.5
24 Nemotron 3 Ultra 550B A55B (Reasoning)
NVIDIA
2026-06-04 $444 10.05 44.1 -19.3 37.8 37.6 57.1 26.5 44.1
25 Qwen3.5 397B A17B (Reasoning)
Alibaba
2026-02-16 $528 12.11 43.6 -19.8 33.7 41.3 55.8 27.2 43.6
26 GPT-5.4 nano (xhigh)
OpenAI
2026-03-17 $289 6.67 43.3 -20.2 38.2 43.9 47.6 24.6 43.3
27 Step 3.7 Flash
StepFun
2026-05-29 $320 7.61 42.1 -21.3 29.7 37.1 59.5 25.1 42.1
28 Qwen3.6 35B A3B (Reasoning)
Alibaba
2026-04-16 $333 7.98 41.7 -21.7 31.6 35.2 58.3 25.2 41.7
29 Qwen3.5 122B A10B (Reasoning)
Alibaba
2026-02-24 $447 11.18 40.0 -23.4 32.3 34.7 53.0 26.5 40.0
30 Mistral Medium 3.5
Mistral
2026-04-29 $1,325 33.53 39.5 -23.9 29.9 35.4 53.2 31.2 39.5
31 Ring-2.6-1T
InclusionAI
2026-05-08 $459 11.94 38.5 -24.9 30.6 33.3 51.5 26.6 38.5
32 Grok 4.3 (high)
xAI
2026-04-30 $319 9.20 34.6 -28.8 37.6 42.2 24.1 25.0 34.6
33 Claude 4.5 Haiku (Reasoning)
Anthropic
2025-10-15 $539 15.80 34.1 -29.3 29.6 32.6 40.2 27.3 34.1
34 Nova 2.0 Pro Preview (medium)
Amazon
2025-11-27 $407 12.31 33.0 -30.4 21.8 30.4 47.0 26.1 33.0
35 Grok 4.3 (Non-reasoning)
xAI
2026-04-30 $297 9.04 32.9 -30.5 24.8 25.1 48.8 24.7 32.9
36 NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA
2026-03-11 $287 8.89 32.3 -31.1 25.4 31.2 40.2 24.6 32.3
37 gpt-oss-120b (high)
OpenAI
2025-08-05 $96.3 3.20 30.1 -33.3 23.8 28.6 37.9 19.8 30.1
38 Gemini 3.1 Flash-Lite
Google
2026-03-03 $94.8 3.52 26.9 -36.5 25.0 30.1 25.7 19.8 26.9
39 Gemma 4 26B A4B (Reasoning)
Google
2026-04-02 $54.5 2.04 26.8 -36.6 25.7 22.4 32.1 17.4 26.8
40 gpt-oss-20B (high)
OpenAI
2025-08-05 $29.9 1.47 20.3 -43.1 14.9 18.5 27.6 14.8 20.3