Run #236

success · fetched 2026-05-21 18:01:06 · 7.68 MB raw HTML · 522 models

Open this run in comparison view · JSON results

Top quality: GPT-5.5 (xhigh) (64.5 pts)

#?Pareto?Model?Released?Cost$?$/Q?Qual?ΔTop?Intel?Code?Agent?Pen?Score?
1 GPT-5.5 (xhigh)
OpenAI
2026-04-23 $3,357 52.05 64.5 0.0 60.2 59.1 74.1 35.3 64.5
2 GPT-5.5 (high)
OpenAI
2026-04-23 $2,159 34.21 63.1 -1.4 58.9 58.5 72.0 33.3 63.1
3 GPT-5.5 (medium)
OpenAI
2026-04-23 $1,199 19.73 60.8 -3.7 56.7 56.2 69.4 30.8 60.8
4 GPT-5.4 (xhigh)
OpenAI
2026-03-05 $2,851 46.99 60.7 -3.8 56.8 57.2 68.0 34.5 60.7
5 Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic
2026-04-16 $5,117 84.78 60.4 -4.1 57.3 52.5 71.3 37.1 60.4
6 Gemini 3.1 Pro Preview
Google
2026-02-19 $892 15.59 57.3 -7.2 57.2 55.5 59.1 29.5 57.3
7 Gemini 3.5 Flash (high)
Google
2026-05-19 $1,552 27.28 56.9 -7.6 55.3 45.0 70.3 31.9 56.9
8 Claude Opus 4.7 (Non-reasoning, High Effort)
Anthropic
2026-04-16 $1,217 21.54 56.5 -8.0 51.8 53.1 64.6 30.9 56.5
9 Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
Anthropic
2026-02-05 $5,231 93.07 56.2 -8.3 52.9 48.1 67.6 37.2 56.2
10 GPT-5.3 Codex (xhigh)
OpenAI
2026-02-05 $1,572 28.20 55.7 -8.8 53.6 53.1 60.5 32.0 55.7
11 Kimi K2.6
Kimi
2026-04-20 $948 17.03 55.7 -8.8 53.9 47.1 66.0 29.8 55.7
12 MiMo-V2.5-Pro
Xiaomi
2026-04-22 $462 8.30 55.6 -8.9 53.8 45.5 67.4 26.6 55.6
13 DeepSeek V4 Pro (Reasoning, Max Effort)
DeepSeek
2026-04-24 $1,071 19.34 55.4 -9.1 51.5 47.5 67.2 30.3 55.4
14 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic
2026-02-17 $4,206 76.17 55.2 -9.3 51.7 50.9 63.0 36.2 55.2
15 GPT-5.5 (low)
OpenAI
2026-04-23 $501 9.24 54.2 -10.3 50.8 52.1 59.7 27.0 54.2
16 Qwen3.6 Max Preview
Alibaba
2026-04-20 $861 15.98 53.9 -10.6 51.8 44.9 64.8 29.3 53.9
17 GPT-5.2 (xhigh)
OpenAI
2025-12-11 $2,304 43.16 53.4 -11.1 51.3 48.7 60.2 33.6 53.4
18 Grok 4.3 (high)
xAI
2026-04-30 $395 7.40 53.4 -11.1 53.2 41.0 65.9 26.0 53.4
19 DeepSeek V4 Pro (Reasoning, High Effort)
DeepSeek
2026-04-24 $690 12.97 53.2 -11.3 49.8 43.2 66.7 28.4 53.2
20 GPT-5.4 mini (xhigh)
OpenAI
2026-03-17 $1,354 25.50 53.1 -11.4 48.9 51.5 58.9 31.3 53.1
21 Claude Opus 4.6 (Non-reasoning, High Effort)
Anthropic
2026-02-05 $1,746 33.10 52.7 -11.7 46.5 47.6 64.2 32.4 52.7
22 Claude Opus 4.5 (Reasoning)
Anthropic
2025-11-24 $2,969 56.66 52.4 -12.1 49.7 47.8 59.6 34.7 52.4
23 GLM-5 (Reasoning)
Z AI
2026-02-11 $547 10.45 52.4 -12.1 49.8 44.2 63.1 27.4 52.4
24 MiMo-V2.5
Xiaomi
2026-04-22 $207 3.97 52.2 -12.3 49.0 42.1 65.5 23.2 52.2
25 Qwen3.6 Plus
Alibaba
2026-04-02 $483 9.37 51.5 -13.0 50.0 42.9 61.7 26.8 51.5
26 MiMo-V2-Pro
Xiaomi
2026-03-18 $351 6.86 51.1 -13.3 49.2 41.4 62.8 25.5 51.1
27 MiniMax-M2.7
MiniMax
2026-03-18 $176 3.44 51.0 -13.5 49.6 41.9 61.5 22.4 51.0
28 Claude Sonnet 4.6 (Non-reasoning, High Effort)
Anthropic
2026-02-17 $1,694 33.34 50.8 -13.7 44.4 46.4 61.6 32.3 50.8
29 GPT-5.4 (low)
OpenAI
2026-03-05 $413 8.17 50.6 -13.9 47.9 45.6 58.2 26.2 50.6
30 GPT-5.2 Codex (xhigh)
OpenAI
2025-12-11 $3,244 65.54 49.5 -15.0 49.0 43.0 56.5 35.1 49.5
31 DeepSeek V4 Flash (Reasoning, High Effort)
DeepSeek
2026-04-24 $57.4 1.16 49.4 -15.1 46.0 39.8 62.3 17.6 49.4
32 Gemini 3 Pro Preview (high)
Google
2025-11-18 $820 16.75 49.0 -15.5 48.4 46.5 52.0 29.1 49.0
33 DeepSeek V4 Flash (Reasoning, Max Effort)
DeepSeek
2026-04-24 $113 2.31 48.8 -15.7 46.5 38.7 61.3 20.5 48.8
34 GPT-5.2 (medium)
OpenAI
2025-12-11 $700 14.40 48.6 -15.9 46.6 44.2 54.9 28.4 48.6
35 GLM-5.1 (Non-reasoning)
Z AI
2026-04-07 $618 12.72 48.5 -15.9 43.8 35.8 66.0 27.9 48.5
36 Kimi K2.5 (Reasoning)
Kimi
2026-01-27 $367 7.58 48.4 -16.1 46.8 39.6 58.9 25.6 48.4
37 Claude Opus 4.5 (Non-reasoning)
Anthropic
2025-11-24 $1,392 28.75 48.4 -16.1 43.1 42.9 59.2 31.4 48.4
38 Qwen3.6 27B (Reasoning)
Alibaba
2026-04-22 $659 13.61 48.4 -16.1 45.8 36.5 62.9 28.2 48.4
39 GPT-5.1 (high)
OpenAI
2025-11-13 $779 16.26 47.9 -16.6 47.7 44.7 51.3 28.9 47.9
40 Grok 4.20 0309 v2 (Reasoning)
xAI
2026-04-07 $514 10.74 47.9 -16.6 49.3 40.5 53.9 27.1 47.9
41 Claude Sonnet 4.6 (Non-reasoning, Low Effort)
Anthropic
2026-02-17 $666 13.97 47.7 -16.8 42.6 43.0 57.5 28.2 47.7
42 Qwen3.5 397B A17B (Reasoning)
Alibaba
2026-02-16 $418 8.81 47.4 -17.1 45.0 41.3 55.8 26.2 47.4
43 Grok 4.20 0309 (Reasoning)
xAI
2026-03-10 $484 10.27 47.2 -17.3 48.5 42.2 50.9 26.9 47.2
44 DeepSeek V4 Pro (Non-reasoning)
DeepSeek
2026-04-24 $616 13.11 47.0 -17.5 39.3 38.4 63.3 27.9 47.0
45 MiMo-V2-Omni-0327
Xiaomi
2026-03-27 $218 4.66 46.8 -17.7 44.9 36.9 58.6 23.4 46.8
46 KAT Coder Pro V2
KwaiKAT
2026-03-27 $73.5 1.57 46.7 -17.8 43.8 45.6 50.7 18.7 46.7
47 Kimi K2.6 (Non-reasoning)
Kimi
2026-04-20 $505 10.81 46.7 -17.8 42.9 38.4 58.7 27.0 46.7
48 GLM-5 (Non-reasoning)
Z AI
2026-02-11 $240 5.15 46.6 -17.9 40.6 39.0 60.3 23.8 46.6
49 GPT-5.5 (Non-reasoning)
OpenAI
2026-04-23 $361 7.74 46.6 -17.9 40.9 48.6 50.2 25.6 46.6
50 Gemini 3 Flash Preview (Reasoning)
Google
2025-12-17 $278 6.02 46.2 -18.3 46.4 42.6 49.7 24.4 46.2
51 Qwen3.6 35B A3B (Reasoning)
Alibaba
2026-04-16 $280 6.13 45.7 -18.8 43.5 35.2 58.3 24.5 45.7
52 GPT-5 Codex (high)
OpenAI
2025-09-23 $995 21.91 45.4 -19.1 44.6 38.9 52.7 30.0 45.4
53 GPT-5.4 nano (xhigh)
OpenAI
2026-03-17 $363 8.04 45.2 -19.3 44.0 43.9 47.6 25.6 45.2
54 GPT-5 (high)
OpenAI
2025-08-07 $913 20.23 45.1 -19.4 44.6 36.0 54.7 29.6 45.1
55 MiniMax-M2.5
MiniMax
2026-02-12 $125 2.77 45.0 -19.5 41.9 37.4 55.6 21.0 45.0
56 Hy3-preview (Reasoning)
Tencent
2026-04-23 $84.4 1.89 44.7 -19.8 41.9 36.5 55.7 19.3 44.7
57 Claude 4.5 Sonnet (Reasoning)
Anthropic
2025-09-29 $1,585 35.65 44.5 -20.0 43.0 38.6 51.7 32.0 44.5
58 GLM-4.7 (Reasoning)
Z AI
2025-12-22 $478 10.75 44.5 -20.0 42.1 36.3 55.0 26.8 44.5
59 DeepSeek V4 Flash (Non-reasoning)
DeepSeek
2026-04-24 $40.0 0.90 44.3 -20.2 36.5 35.2 61.3 16.0 44.3
60 Qwen3.5 27B (Reasoning)
Alibaba
2026-02-24 $299 6.82 43.8 -20.6 42.1 34.9 54.6 24.8 43.8
61 DeepSeek V3.2 (Reasoning)
DeepSeek
2025-12-01 $75.7 1.73 43.8 -20.7 41.7 36.7 52.9 18.8 43.8
62 Qwen3.5 397B A17B (Non-reasoning)
Alibaba
2026-02-16 $186 4.27 43.6 -20.9 40.1 37.4 53.3 22.7 43.6
63 GPT-5.1 Codex (high)
OpenAI
2025-11-13 $892 20.52 43.5 -21.0 43.1 36.6 50.7 29.5 43.5
64 Qwen3.5 122B A10B (Reasoning)
Alibaba
2026-02-24 $354 8.21 43.1 -21.4 41.6 34.7 53.0 25.5 43.1
65 Mistral Medium 3.5
Mistral
2026-04-29 $1,001 23.50 42.6 -21.9 39.2 35.4 53.2 30.0 42.6
66 GPT-5 (medium)
OpenAI
2025-08-07 $552 13.05 42.3 -22.2 42.0 38.9 45.8 27.4 42.3
67 Grok 4.3 (low)
xAI
2026-04-30 $98.7 2.35 42.0 -22.5 43.9 31.6 50.4 19.9 42.0
68 Gemini 3 Pro Preview (low)
Google
2025-11-18 $355 8.47 41.9 -22.6 41.3 39.4 45.0 25.5 41.9
69 Qwen3.6 27B (Non-reasoning)
Alibaba
2026-04-22 $234 5.64 41.5 -23.0 37.1 26.6 60.9 23.7 41.5
70 MiMo-V2-Flash (Feb 2026)
Xiaomi
2025-12-16 $66.5 1.61 41.2 -23.3 41.5 33.5 48.8 18.2 41.2
71 Kimi K2 Thinking
Kimi
2025-11-06 $308 7.48 41.2 -23.3 40.9 34.8 47.9 24.9 41.2
72 Grok 4
xAI
2025-07-10 $2,881 69.98 41.2 -23.3 41.5 40.5 41.5 34.6 41.2
73 MiMo-V2-Flash (Reasoning)
Xiaomi
2025-12-16 $47.5 1.16 41.0 -23.4 39.2 31.8 52.1 16.8 41.0
74 MiMo-V2.5-Pro (Non-reasoning)
Xiaomi
2026-04-22 $703 17.14 41.0 -23.5 35.6 36.8 50.8 28.5 41.0
75 Qwen3.5 27B (Non-reasoning)
Alibaba
2026-02-24 $122 2.99 40.7 -23.8 37.2 33.4 51.5 20.9 40.7
76 GPT-5 mini (high)
OpenAI
2025-08-07 $168 4.14 40.7 -23.8 41.2 35.3 45.5 22.3 40.7
77 Step 3.5 Flash
StepFun
2026-02-02 $74.8 1.85 40.5 -24.0 37.8 31.6 52.0 18.7 40.5
78 Claude 4.5 Sonnet (Non-reasoning)
Anthropic
2025-09-29 $827 20.46 40.4 -24.1 37.1 33.5 50.6 29.2 40.4
79 GLM-4.7 (Non-reasoning)
Z AI
2025-12-22 $147 3.66 40.2 -24.3 34.2 32.0 54.3 21.7 40.2
80 Qwen3 Max Thinking
Alibaba
2026-01-26 $669 16.65 40.2 -24.3 39.8 30.5 50.1 28.3 40.2
81 GPT-5 (low)
OpenAI
2025-08-07 $228 5.71 39.9 -24.6 39.2 30.7 49.7 23.6 39.9
82 MiniMax-M2.1
MiniMax
2025-12-23 $114 2.87 39.9 -24.6 39.4 32.8 47.4 20.6 39.9
83 Qwen3.5 Omni Plus
Alibaba
2026-03-30 $150 3.77 39.7 -24.8 38.6 27.6 52.8 21.8 39.7
84 Qwen3.5 122B A10B (Non-reasoning)
Alibaba
2026-02-24 $166 4.27 39.0 -25.5 35.9 31.6 49.5 22.2 39.0
85 Kimi K2.5 (Non-reasoning)
Kimi
2026-01-27 $141 3.65 38.6 -25.8 37.3 25.8 52.8 21.5 38.6
86 Claude 4 Sonnet (Reasoning)
Anthropic
2025-05-22 $1,349 34.97 38.6 -25.9 38.7 34.1 43.0 31.3 38.6
87 GPT-5.4 (Non-reasoning)
OpenAI
2026-03-05 $272 7.06 38.5 -26.0 35.4 41.0 39.1 24.3 38.5
88 GPT-5.4 mini (medium)
OpenAI
2026-03-17 $302 7.85 38.5 -26.0 37.7 37.5 40.3 24.8 38.5
89 Ling-2.6-1T
InclusionAI
2026-04-23 $95.0 2.48 38.3 -26.2 33.6 33.1 48.2 19.8 38.3
90 GPT-5.4 nano (medium)
OpenAI
2026-03-17 $90.6 2.37 38.3 -26.2 38.1 35.0 41.6 19.6 38.3
91 Hy3-preview (Non-reasoning)
Tencent
2026-04-23 $36.1 0.94 38.2 -26.3 33.7 34.3 46.7 15.6 38.2
92 GPT-5.1 Codex mini (high)
OpenAI
2025-11-13 $202 5.32 37.9 -26.6 38.6 36.4 38.7 23.0 37.9
93 Nova 2.0 Pro Preview (medium)
Amazon
2025-11-27 $467 12.38 37.7 -26.8 35.7 30.4 47.0 26.7 37.7
94 o3
OpenAI
2025-04-16 $1,025 27.25 37.6 -26.9 38.4 38.4 36.1 30.1 37.6
95 MiniMax-M2
MiniMax
2025-10-26 $116 3.08 37.6 -26.9 36.1 29.2 47.5 20.6 37.6
96 GPT-5 mini (medium)
OpenAI
2025-08-07 $61.9 1.65 37.6 -26.9 38.9 32.8 40.9 17.9 37.6
97 Qwen3.5 35B A3B (Reasoning)
Alibaba
2026-02-24 $302 8.12 37.2 -27.3 37.1 30.3 44.1 24.8 37.2
98 Claude 4.5 Haiku (Reasoning)
Anthropic
2025-10-15 $620 16.92 36.6 -27.9 37.1 32.6 40.2 27.9 36.6
99 Gemini 3 Flash Preview (Non-reasoning)
Google
2025-12-17 $66.0 1.83 36.0 -28.5 35.0 37.8 35.0 18.2 36.0
100 GPT-5.2 (Non-reasoning)
OpenAI
2025-12-11 $225 6.28 35.9 -28.6 33.6 34.7 39.5 23.5 35.9