Run #451

success · fetched 2026-06-03 10:00:28 · 7.92 MB raw HTML · 529 models

Open this run in comparison view · JSON results

Top quality: Claude Opus 4.8 (Adaptive Reasoning, Max Effort) (65.3 pts)

#?Pareto?Model?Released?Cost$?$/Q?Qual?ΔTop?Intel?Code?Agent?Pen?Score?
1 Claude Opus 4.8 (Adaptive Reasoning, Max Effort)
Anthropic
2026-05-28 $4,686 71.74 65.3 0.0 61.4 56.7 77.8 36.7 65.3
2 GPT-5.5 (xhigh)
OpenAI
2026-04-23 $3,357 52.05 64.5 -0.8 60.2 59.1 74.1 35.3 64.5
3 GPT-5.5 (high)
OpenAI
2026-04-23 $2,159 34.21 63.1 -2.2 58.9 58.5 72.0 33.3 63.1
4 GPT-5.5 (medium)
OpenAI
2026-04-23 $1,199 19.73 60.8 -4.6 56.7 56.2 69.4 30.8 60.8
5 GPT-5.4 (xhigh)
OpenAI
2026-03-05 $2,851 46.99 60.7 -4.7 56.8 57.2 68.0 34.5 60.7
6 Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic
2026-04-16 $5,117 84.78 60.4 -5.0 57.3 52.5 71.3 37.1 60.4
7 Qwen3.7 Max
Alibaba
2026-05-19 $1,202 20.82 57.8 -7.6 56.6 50.1 66.6 30.8 57.8
8 Gemini 3.1 Pro Preview
Google
2026-02-19 $892 15.59 57.3 -8.1 57.2 55.5 59.1 29.5 57.3
9 Gemini 3.5 Flash (high)
Google
2026-05-19 $1,552 27.28 56.9 -8.5 55.3 45.0 70.3 31.9 56.9
10 Claude Opus 4.7 (Non-reasoning, High Effort)
Anthropic
2026-04-16 $1,217 21.54 56.5 -8.8 51.8 53.1 64.6 30.9 56.5
11 Gemini 3.5 Flash (medium)
Google
2026-05-19 $1,417 25.14 56.4 -8.9 54.8 43.9 70.4 31.5 56.4
12 Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
Anthropic
2026-02-05 $5,231 93.07 56.2 -9.1 52.9 48.1 67.6 37.2 56.2
13 GPT-5.3 Codex (xhigh)
OpenAI
2026-02-05 $1,572 28.20 55.7 -9.6 53.6 53.1 60.5 32.0 55.7
14 Kimi K2.6
Kimi
2026-04-20 $948 17.03 55.7 -9.7 53.9 47.1 66.0 29.8 55.7
15 MiMo-V2.5-Pro
Xiaomi
2026-04-22 $161 2.89 55.6 -9.7 53.8 45.5 67.4 22.1 55.6
16 DeepSeek V4 Pro (Reasoning, Max Effort)
DeepSeek
2026-04-24 $268 4.83 55.4 -9.9 51.5 47.5 67.2 24.3 55.4
17 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic
2026-02-17 $4,206 76.17 55.2 -10.1 51.7 50.9 63.0 36.2 55.2
18 Qwen3.7 Plus
Alibaba
2026-06-01 $209 3.80 55.0 -10.4 53.3 46.5 65.1 23.2 55.0
19 GPT-5.5 (low)
OpenAI
2026-04-23 $501 9.24 54.2 -11.1 50.8 52.1 59.7 27.0 54.2
20 Qwen3.6 Max Preview
Alibaba
2026-04-20 $861 15.98 53.9 -11.5 51.8 44.9 64.8 29.3 53.9
21 GPT-5.2 (xhigh)
OpenAI
2025-12-11 $2,304 43.16 53.4 -11.9 51.3 48.7 60.2 33.6 53.4
22 Grok 4.3 (high)
xAI
2026-04-30 $395 7.40 53.4 -11.9 53.2 41.0 65.9 26.0 53.4
23 DeepSeek V4 Pro (Reasoning, High Effort)
DeepSeek
2026-04-24 $173 3.24 53.2 -12.1 49.8 43.2 66.7 22.4 53.2
24 GPT-5.4 mini (xhigh)
OpenAI
2026-03-17 $1,354 25.50 53.1 -12.2 48.9 51.5 58.9 31.3 53.1
25 Claude Opus 4.6 (Non-reasoning, High Effort)
Anthropic
2026-02-05 $1,746 33.10 52.7 -12.6 46.5 47.6 64.2 32.4 52.7
26 Claude Opus 4.5 (Reasoning)
Anthropic
2025-11-24 $2,969 56.66 52.4 -12.9 49.7 47.8 59.6 34.7 52.4
27 GLM-5 (Reasoning)
Z AI
2026-02-11 $547 10.45 52.4 -13.0 49.8 44.2 63.1 27.4 52.4
28 MiMo-V2.5
Xiaomi
2026-04-22 $49.3 0.94 52.2 -13.1 49.0 42.1 65.5 16.9 52.2
29 Qwen3.6 Plus
Alibaba
2026-04-02 $483 9.37 51.5 -13.8 50.0 42.9 61.7 26.8 51.5
30 MiMo-V2-Pro
Xiaomi
2026-03-18 $351 6.86 51.1 -14.2 49.2 41.4 62.8 25.5 51.1
31 MiniMax-M2.7
MiniMax
2026-03-18 $176 3.44 51.0 -14.3 49.6 41.9 61.5 22.4 51.0
32 Claude Sonnet 4.6 (Non-reasoning, High Effort)
Anthropic
2026-02-17 $1,694 33.34 50.8 -14.5 44.4 46.4 61.6 32.3 50.8
33 GPT-5.4 (low)
OpenAI
2026-03-05 $413 8.17 50.6 -14.7 47.9 45.6 58.2 26.2 50.6
34 GPT-5.2 Codex (xhigh)
OpenAI
2025-12-11 $3,244 65.54 49.5 -15.8 49.0 43.0 56.5 35.1 49.5
35 DeepSeek V4 Flash (Reasoning, High Effort)
DeepSeek
2026-04-24 $57.4 1.16 49.4 -16.0 46.0 39.8 62.3 17.6 49.4
36 Gemini 3 Pro Preview (high)
Google
2025-11-18 $820 16.75 49.0 -16.4 48.4 46.5 52.0 29.1 49.0
37 DeepSeek V4 Flash (Reasoning, Max Effort)
DeepSeek
2026-04-24 $113 2.31 48.8 -16.5 46.5 38.7 61.3 20.5 48.8
38 GPT-5.2 (medium)
OpenAI
2025-12-11 $700 14.40 48.6 -16.7 46.6 44.2 54.9 28.4 48.6
39 GLM-5.1 (Non-reasoning)
Z AI
2026-04-07 $618 12.72 48.5 -16.8 43.8 35.8 66.0 27.9 48.5
40 Kimi K2.5 (Reasoning)
Kimi
2026-01-27 $367 7.58 48.4 -16.9 46.8 39.6 58.9 25.6 48.4
41 Claude Opus 4.5 (Non-reasoning)
Anthropic
2025-11-24 $1,392 28.75 48.4 -16.9 43.1 42.9 59.2 31.4 48.4
42 Qwen3.6 27B (Reasoning)
Alibaba
2026-04-22 $659 13.61 48.4 -16.9 45.8 36.5 62.9 28.2 48.4
43 GPT-5.1 (high)
OpenAI
2025-11-13 $779 16.26 47.9 -17.4 47.7 44.7 51.3 28.9 47.9
44 Grok 4.20 0309 v2 (Reasoning)
xAI
2026-04-07 $514 10.74 47.9 -17.4 49.3 40.5 53.9 27.1 47.9
45 Claude Sonnet 4.6 (Non-reasoning, Low Effort)
Anthropic
2026-02-17 $666 13.97 47.7 -17.6 42.6 43.0 57.5 28.2 47.7
46 Qwen3.5 397B A17B (Reasoning)
Alibaba
2026-02-16 $418 8.81 47.4 -17.9 45.0 41.3 55.8 26.2 47.4
47 Grok 4.20 0309 (Reasoning)
xAI
2026-03-10 $484 10.27 47.2 -18.1 48.5 42.2 50.9 26.9 47.2
48 Gemini 3.5 Flash (minimal)
Google
2026-05-19 $750 15.91 47.1 -18.2 43.3 47.1 51.0 28.7 47.1
49 Grok 4.3 (medium)
xAI
2026-04-30 $161 3.43 47.1 -18.2 48.8 35.1 57.5 22.1 47.1
50 DeepSeek V4 Pro (Non-reasoning)
DeepSeek
2026-04-24 $154 3.28 47.0 -18.4 39.3 38.4 63.3 21.9 47.0
51 MiMo-V2-Omni-0327
Xiaomi
2026-03-27 $218 4.66 46.8 -18.5 44.9 36.9 58.6 23.4 46.8
52 KAT Coder Pro V2
KwaiKAT
2026-03-27 $73.5 1.57 46.7 -18.6 43.8 45.6 50.7 18.7 46.7
53 Kimi K2.6 (Non-reasoning)
Kimi
2026-04-20 $505 10.81 46.7 -18.6 42.9 38.4 58.7 27.0 46.7
54 GLM-5 (Non-reasoning)
Z AI
2026-02-11 $240 5.15 46.6 -18.7 40.6 39.0 60.3 23.8 46.6
55 GPT-5.5 (Non-reasoning)
OpenAI
2026-04-23 $361 7.74 46.6 -18.7 40.9 48.6 50.2 25.6 46.6
56 Step 3.7 Flash
StepFun
2026-05-29 $368 7.93 46.4 -18.9 42.6 37.1 59.5 25.7 46.4
57 Gemini 3 Flash Preview (Reasoning)
Google
2025-12-17 $278 6.02 46.2 -19.1 46.4 42.6 49.7 24.4 46.2
58 Qwen3.6 35B A3B (Reasoning)
Alibaba
2026-04-16 $280 6.13 45.7 -19.7 43.5 35.2 58.3 24.5 45.7
59 GPT-5 Codex (high)
OpenAI
2025-09-23 $995 21.91 45.4 -19.9 44.6 38.9 52.7 30.0 45.4
60 GPT-5.4 nano (xhigh)
OpenAI
2026-03-17 $363 8.04 45.2 -20.2 44.0 43.9 47.6 25.6 45.2
61 GPT-5 (high)
OpenAI
2025-08-07 $913 20.23 45.1 -20.2 44.6 36.0 54.7 29.6 45.1
62 MiniMax-M2.5
MiniMax
2026-02-12 $125 2.77 45.0 -20.3 41.9 37.4 55.6 21.0 45.0
63 Hy3-preview (Reasoning)
Tencent
2026-04-23 $84.4 1.89 44.7 -20.7 41.9 36.5 55.7 19.3 44.7
64 Claude 4.5 Sonnet (Reasoning)
Anthropic
2025-09-29 $1,585 35.65 44.5 -20.9 43.0 38.6 51.7 32.0 44.5
65 GLM-4.7 (Reasoning)
Z AI
2025-12-22 $478 10.75 44.5 -20.9 42.1 36.3 55.0 26.8 44.5
66 DeepSeek V4 Flash (Non-reasoning)
DeepSeek
2026-04-24 $40.0 0.90 44.3 -21.0 36.5 35.2 61.3 16.0 44.3
67 Qwen3.5 27B (Reasoning)
Alibaba
2026-02-24 $299 6.82 43.8 -21.5 42.1 34.9 54.6 24.8 43.8
68 DeepSeek V3.2 (Reasoning)
DeepSeek
2025-12-01 $75.7 1.73 43.8 -21.5 41.7 36.7 52.9 18.8 43.8
69 Qwen3.5 397B A17B (Non-reasoning)
Alibaba
2026-02-16 $186 4.27 43.6 -21.7 40.1 37.4 53.3 22.7 43.6
70 GPT-5.1 Codex (high)
OpenAI
2025-11-13 $892 20.52 43.5 -21.8 43.1 36.6 50.7 29.5 43.5
71 Qwen3.5 122B A10B (Reasoning)
Alibaba
2026-02-24 $354 8.21 43.1 -22.2 41.6 34.7 53.0 25.5 43.1
72 Mistral Medium 3.5
Mistral
2026-04-29 $1,001 23.50 42.6 -22.7 39.2 35.4 53.2 30.0 42.6
73 GPT-5 (medium)
OpenAI
2025-08-07 $552 13.05 42.3 -23.1 42.0 38.9 45.8 27.4 42.3
74 Grok 4.3 (low)
xAI
2026-04-30 $98.7 2.35 42.0 -23.3 43.9 31.6 50.4 19.9 42.0
75 Gemini 3 Pro Preview (low)
Google
2025-11-18 $355 8.47 41.9 -23.4 41.3 39.4 45.0 25.5 41.9
76 GPT-5.5 Instant (May 2026)
OpenAI
2026-05-05 $368 8.84 41.6 -23.7 41.8 45.1 38.1 25.7 41.6
77 Qwen3.6 27B (Non-reasoning)
Alibaba
2026-04-22 $234 5.64 41.5 -23.8 37.1 26.6 60.9 23.7 41.5
78 MiMo-V2-Flash (Feb 2026)
Xiaomi
2025-12-16 $66.5 1.61 41.2 -24.1 41.5 33.5 48.8 18.2 41.2
79 Kimi K2 Thinking
Kimi
2025-11-06 $308 7.48 41.2 -24.1 40.9 34.8 47.9 24.9 41.2
80 Grok 4
xAI
2025-07-10 $2,881 69.98 41.2 -24.1 41.5 40.5 41.5 34.6 41.2
81 Ring-2.6-1T
InclusionAI
2026-05-08 $334 8.12 41.1 -24.2 38.5 33.3 51.5 25.2 41.1
82 MiMo-V2-Flash (Reasoning)
Xiaomi
2025-12-16 $47.5 1.16 41.0 -24.3 39.2 31.8 52.1 16.8 41.0
83 MiMo-V2.5-Pro (Non-reasoning)
Xiaomi
2026-04-22 $633 15.43 41.0 -24.3 35.6 36.8 50.8 28.0 41.0
84 Qwen3.5 27B (Non-reasoning)
Alibaba
2026-02-24 $128 3.15 40.7 -24.6 37.2 33.4 51.5 21.1 40.7
85 GPT-5 mini (high)
OpenAI
2025-08-07 $168 4.14 40.7 -24.7 41.2 35.3 45.5 22.3 40.7
86 Step 3.5 Flash
StepFun
2026-02-02 $74.8 1.85 40.5 -24.8 37.8 31.6 52.0 18.7 40.5
87 Step 3.5 Flash 2603
StepFun
2026-04-02 $95.3 2.36 40.4 -24.9 38.5 34.6 48.2 19.8 40.4
88 Claude 4.5 Sonnet (Non-reasoning)
Anthropic
2025-09-29 $827 20.46 40.4 -24.9 37.1 33.5 50.6 29.2 40.4
89 GLM-4.7 (Non-reasoning)
Z AI
2025-12-22 $147 3.66 40.2 -25.1 34.2 32.0 54.3 21.7 40.2
90 Qwen3 Max Thinking
Alibaba
2026-01-26 $669 16.65 40.2 -25.2 39.8 30.5 50.1 28.3 40.2
91 GPT-5 (low)
OpenAI
2025-08-07 $228 5.71 39.9 -25.5 39.2 30.7 49.7 23.6 39.9
92 MiniMax-M2.1
MiniMax
2025-12-23 $114 2.87 39.9 -25.5 39.4 32.8 47.4 20.6 39.9
93 Qwen3.5 Omni Plus
Alibaba
2026-03-30 $150 3.77 39.7 -25.6 38.6 27.6 52.8 21.8 39.7
94 Qwen3.5 122B A10B (Non-reasoning)
Alibaba
2026-02-24 $166 4.27 39.0 -26.3 35.9 31.6 49.5 22.2 39.0
95 Kimi K2.5 (Non-reasoning)
Kimi
2026-01-27 $141 3.65 38.6 -26.7 37.3 25.8 52.8 21.5 38.6
96 Claude 4 Sonnet (Reasoning)
Anthropic
2025-05-22 $1,349 34.97 38.6 -26.7 38.7 34.1 43.0 31.3 38.6
97 GPT-5.4 (Non-reasoning)
OpenAI
2026-03-05 $272 7.06 38.5 -26.8 35.4 41.0 39.1 24.3 38.5
98 GPT-5.4 mini (medium)
OpenAI
2026-03-17 $302 7.85 38.5 -26.8 37.7 37.5 40.3 24.8 38.5
99 Ling-2.6-1T
InclusionAI
2026-04-23 $95.0 2.48 38.3 -27.0 33.6 33.1 48.2 19.8 38.3
100 GPT-5.4 nano (medium)
OpenAI
2026-03-17 $90.6 2.37 38.3 -27.1 38.1 35.0 41.6 19.6 38.3