Run #1

success · fetched 2026-05-11 23:17:33 · 7.54 MB raw HTML · 515 models

Open this run in comparison view · JSON results

Top quality: GPT-5.5 (xhigh) (64.5 pts)

#?Pareto?Model?Released?Cost$?$/Q?Qual?ΔTop?Intel?Code?Agent?Pen?Score?
1 GPT-5.5 (xhigh)
OpenAI
2026-04-23 $3,357 52.05 64.5 0.0 60.2 59.1 74.1 35.3 64.5
2 GPT-5.5 (high)
OpenAI
2026-04-23 $2,159 34.21 63.1 -1.4 58.9 58.5 72.0 33.3 63.1
3 GPT-5.5 (medium)
OpenAI
2026-04-23 $1,199 19.73 60.8 -3.7 56.7 56.2 69.4 30.8 60.8
4 GPT-5.4 (xhigh)
OpenAI
2026-03-05 $2,851 46.99 60.7 -3.8 56.8 57.3 68.0 34.5 60.7
5 Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic
2026-04-16 $5,117 84.78 60.4 -4.1 57.3 52.5 71.3 37.1 60.4
6 Gemini 3.1 Pro Preview
Google
2026-02-19 $892 15.58 57.3 -7.2 57.2 55.5 59.1 29.5 57.3
7 Claude Opus 4.7 (Non-reasoning, High Effort)
Anthropic
2026-04-16 $1,215 21.50 56.5 -8.0 51.8 53.1 64.6 30.8 56.5
8 Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
Anthropic
2026-02-05 $5,231 93.07 56.2 -8.3 53.0 48.1 67.6 37.2 56.2
9 GPT-5.3 Codex (xhigh)
OpenAI
2026-02-05 $1,572 28.20 55.7 -8.8 53.6 53.1 60.5 32.0 55.7
10 Kimi K2.6
Kimi
2026-04-20 $948 17.03 55.7 -8.8 53.9 47.1 66.0 29.8 55.7
11 MiMo-V2.5-Pro
Xiaomi
2026-04-22 $462 8.30 55.6 -8.9 53.8 45.5 67.4 26.6 55.6
12 DeepSeek V4 Pro (Reasoning, Max Effort)
DeepSeek
2026-04-24 $1,071 19.34 55.4 -9.1 51.5 47.5 67.2 30.3 55.4
13 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic
2026-02-17 $4,206 76.17 55.2 -9.3 51.7 50.9 63.0 36.2 55.2
14 GPT-5.5 (low)
OpenAI
2026-04-23 $501 9.24 54.2 -10.3 50.8 52.1 59.7 27.0 54.2
15 GLM-5.1 (Reasoning)
Z AI
2026-04-07 $567 10.50 53.9 -10.6 51.4 43.4 67.0 27.5 53.9
16 Qwen3.6 Max Preview
Alibaba
2026-04-20 $861 15.98 53.9 -10.6 51.8 44.9 64.8 29.3 53.9
17 GPT-5.2 (xhigh)
OpenAI
2025-12-11 $2,304 43.16 53.4 -11.1 51.3 48.7 60.2 33.6 53.4
18 Grok 4.3
xAI
2026-04-30 $395 7.40 53.4 -11.1 53.2 41.0 65.9 26.0 53.4
19 DeepSeek V4 Pro (Reasoning, High Effort)
DeepSeek
2026-04-24 $690 12.97 53.2 -11.3 49.8 43.3 66.7 28.4 53.2
20 GPT-5.4 mini (xhigh)
OpenAI
2026-03-17 $1,354 25.50 53.1 -11.4 48.9 51.5 58.9 31.3 53.1
21 Claude Opus 4.6 (Non-reasoning, High Effort)
Anthropic
2026-02-05 $1,746 33.10 52.7 -11.7 46.5 47.6 64.2 32.4 52.7
22 Claude Opus 4.5 (Reasoning)
Anthropic
2025-11-24 $2,969 56.65 52.4 -12.1 49.7 47.8 59.6 34.7 52.4
23 GLM-5 (Reasoning)
Z AI
2026-02-11 $547 10.45 52.4 -12.1 49.8 44.2 63.1 27.4 52.4
24 MiMo-V2.5
Xiaomi
2026-04-22 $207 3.97 52.2 -12.3 49.0 42.1 65.5 23.2 52.2
25 Qwen3.6 Plus
Alibaba
2026-04-02 $483 9.37 51.5 -13.0 50.0 42.9 61.7 26.8 51.5
26 MiMo-V2-Pro
Xiaomi
2026-03-18 $351 6.86 51.1 -13.4 49.2 41.4 62.8 25.5 51.1
27 MiniMax-M2.7
MiniMax
2026-03-18 $176 3.44 51.0 -13.5 49.6 41.9 61.5 22.4 51.0
28 Claude Sonnet 4.6 (Non-reasoning, High Effort)
Anthropic
2026-02-17 $1,694 33.34 50.8 -13.7 44.4 46.4 61.6 32.3 50.8
29 GPT-5.4 (low)
OpenAI
2026-03-05 $386 7.63 50.6 -13.9 47.9 45.6 58.2 25.9 50.6
30 GPT-5.2 Codex (xhigh)
OpenAI
2025-12-11 $3,244 65.54 49.5 -15.0 49.0 43.0 56.5 35.1 49.5
31 Gemini 3 Pro Preview (high)
Google
2025-11-18 $820 16.75 49.0 -15.5 48.4 46.5 52.0 29.1 49.0
32 DeepSeek V4 Flash (Reasoning, Max Effort)
DeepSeek
2026-04-24 $113 2.31 48.8 -15.7 46.5 38.7 61.3 20.5 48.8
33 GPT-5.2 (medium)
OpenAI
2025-12-11 $700 14.40 48.6 -15.9 46.6 44.2 54.9 28.4 48.6
34 GLM-5.1 (Non-reasoning)
Z AI
2026-04-07 $618 12.72 48.5 -16.0 43.8 35.8 66.0 27.9 48.5
35 Kimi K2.5 (Reasoning)
Kimi
2026-01-27 $354 7.30 48.4 -16.1 46.8 39.5 58.9 25.5 48.4
36 Claude Opus 4.5 (Non-reasoning)
Anthropic
2025-11-24 $1,392 28.75 48.4 -16.1 43.1 42.9 59.2 31.4 48.4
37 Qwen3.6 27B (Reasoning)
Alibaba
2026-04-22 $659 13.61 48.4 -16.1 45.8 36.5 62.9 28.2 48.4
38 GPT-5.1 (high)
OpenAI
2025-11-13 $779 16.26 47.9 -16.6 47.7 44.7 51.3 28.9 47.9
39 Grok 4.20 0309 v2 (Reasoning)
xAI
2026-04-07 $514 10.74 47.9 -16.6 49.3 40.5 53.9 27.1 47.9
40 Claude Sonnet 4.6 (Non-reasoning, Low Effort)
Anthropic
2026-02-17 $666 13.97 47.7 -16.8 42.6 43.0 57.5 28.2 47.7
41 DeepSeek V4 Flash (Reasoning, High Effort)
DeepSeek
2026-04-24 $57.2 1.20 47.5 -17.0 44.9 39.8 57.8 17.6 47.5
42 Qwen3.5 397B A17B (Reasoning)
Alibaba
2026-02-16 $418 8.81 47.4 -17.1 45.0 41.3 55.8 26.2 47.4
43 Grok 4.20 0309 (Reasoning)
xAI
2026-03-10 $484 10.27 47.2 -17.3 48.5 42.2 50.9 26.9 47.2
44 DeepSeek V4 Pro (Non-reasoning)
DeepSeek
2026-04-24 $616 13.11 47.0 -17.5 39.3 38.4 63.3 27.9 47.0
45 MiMo-V2-Omni-0327
Xiaomi
2026-03-27 $218 4.66 46.8 -17.7 44.9 36.9 58.6 23.4 46.8
46 KAT Coder Pro V2
KwaiKAT
2026-03-27 $73.5 1.57 46.7 -17.8 43.8 45.6 50.7 18.7 46.7
47 Kimi K2.6 (Non-reasoning)
Kimi
2026-04-20 $505 10.81 46.7 -17.8 43.0 38.4 58.7 27.0 46.7
48 GLM-5 (Non-reasoning)
Z AI
2026-02-11 $240 5.15 46.6 -17.9 40.6 39.0 60.3 23.8 46.6
49 GPT-5.5 (Non-reasoning)
OpenAI
2026-04-23 $361 7.74 46.6 -17.9 40.9 48.6 50.2 25.6 46.6
50 Gemini 3 Flash Preview (Reasoning)
Google
2025-12-17 $278 6.02 46.2 -18.3 46.4 42.6 49.7 24.4 46.2
51 Qwen3.6 35B A3B (Reasoning)
Alibaba
2026-04-16 $280 6.13 45.7 -18.8 43.5 35.1 58.3 24.5 45.7
52 GPT-5 Codex (high)
OpenAI
2025-09-23 $995 21.91 45.4 -19.1 44.6 38.9 52.7 30.0 45.4
53 GPT-5.4 nano (xhigh)
OpenAI
2026-03-17 $364 8.06 45.2 -19.3 44.0 43.9 47.6 25.6 45.2
54 GPT-5 (high)
OpenAI
2025-08-07 $910 20.18 45.1 -19.4 44.6 36.0 54.6 29.6 45.1
55 MiniMax-M2.5
MiniMax
2026-02-12 $125 2.77 45.0 -19.5 41.9 37.4 55.6 21.0 45.0
56 Claude 4.5 Sonnet (Reasoning)
Anthropic
2025-09-29 $1,585 35.64 44.5 -20.0 43.0 38.6 51.7 32.0 44.5
57 GLM-4.7 (Reasoning)
Z AI
2025-12-22 $478 10.75 44.5 -20.0 42.1 36.3 55.0 26.8 44.5
58 DeepSeek V4 Flash (Non-reasoning)
DeepSeek
2026-04-24 $40.0 0.90 44.3 -20.2 36.5 35.1 61.3 16.0 44.3
59 Qwen3.5 27B (Reasoning)
Alibaba
2026-02-24 $299 6.82 43.9 -20.6 42.1 34.9 54.6 24.8 43.9
60 DeepSeek V3.2 (Reasoning)
DeepSeek
2025-12-01 $75.7 1.73 43.8 -20.7 41.7 36.7 52.9 18.8 43.8
61 Qwen3.5 397B A17B (Non-reasoning)
Alibaba
2026-02-16 $186 4.27 43.6 -20.9 40.1 37.4 53.3 22.7 43.6
62 GPT-5.1 Codex (high)
OpenAI
2025-11-13 $892 20.52 43.5 -21.0 43.1 36.6 50.7 29.5 43.5
63 Qwen3.5 122B A10B (Reasoning)
Alibaba
2026-02-24 $354 8.21 43.1 -21.4 41.6 34.7 53.0 25.5 43.1
64 Mistral Medium 3.5
Mistral
2026-04-29 $1,001 23.49 42.6 -21.9 39.2 35.4 53.2 30.0 42.6
65 GPT-5 (medium)
OpenAI
2025-08-07 $550 13.02 42.3 -22.2 42.0 39.0 45.8 27.4 42.3
66 Gemini 3 Pro Preview (low)
Google
2025-11-18 $355 8.47 41.9 -22.6 41.3 39.4 45.0 25.5 41.9
67 Qwen3.6 27B (Non-reasoning)
Alibaba
2026-04-22 $234 5.64 41.5 -23.0 37.1 26.6 60.9 23.7 41.5
68 MiMo-V2-Flash (Feb 2026)
Xiaomi
2025-12-16 $68.3 1.66 41.2 -23.3 41.5 33.5 48.8 18.3 41.2
69 Kimi K2 Thinking
Kimi
2025-11-06 $308 7.48 41.2 -23.3 40.9 34.8 47.9 24.9 41.2
70 Grok 4
xAI
2025-07-10 $2,222 53.97 41.2 -23.3 41.5 40.5 41.5 33.5 41.2
71 MiMo-V2-Flash (Reasoning)
Xiaomi
2025-12-16 $47.5 1.16 41.0 -23.4 39.2 31.8 52.1 16.8 41.0
72 MiMo-V2.5-Pro (Non-reasoning)
Xiaomi
2026-04-22 $703 17.14 41.0 -23.5 35.6 36.8 50.8 28.5 41.0
73 Qwen3.5 27B (Non-reasoning)
Alibaba
2026-02-24 $122 2.99 40.7 -23.8 37.2 33.4 51.5 20.9 40.7
74 GPT-5 mini (high)
OpenAI
2025-08-07 $168 4.14 40.6 -23.8 41.2 35.3 45.5 22.3 40.6
75 Step 3.5 Flash
StepFun
2026-02-02 $74.8 1.85 40.5 -24.0 37.8 31.6 52.0 18.7 40.5
76 Claude 4.5 Sonnet (Non-reasoning)
Anthropic
2025-09-29 $827 20.46 40.4 -24.1 37.1 33.5 50.6 29.2 40.4
77 GLM-4.7 (Non-reasoning)
Z AI
2025-12-22 $147 3.66 40.2 -24.3 34.2 32.0 54.3 21.7 40.2
78 Qwen3 Max Thinking
Alibaba
2026-01-26 $669 16.65 40.2 -24.3 39.9 30.5 50.1 28.3 40.2
79 GPT-5 (low)
OpenAI
2025-08-07 $227 5.70 39.9 -24.6 39.2 30.7 49.7 23.6 39.9
80 MiniMax-M2.1
MiniMax
2025-12-23 $114 2.87 39.9 -24.6 39.4 32.8 47.4 20.6 39.9
81 Qwen3.5 Omni Plus
Alibaba
2026-03-30 $153 3.86 39.7 -24.8 38.6 27.6 52.8 21.9 39.7
82 Grok 4.1 Fast (Reasoning)
xAI
2025-11-19 $39.6 1.00 39.6 -24.9 38.6 30.9 49.3 16.0 39.6
83 Qwen3.5 122B A10B (Non-reasoning)
Alibaba
2026-02-24 $166 4.27 39.0 -25.5 35.9 31.6 49.5 22.2 39.0
84 Kimi K2.5 (Non-reasoning)
Kimi
2026-01-27 $141 3.65 38.6 -25.9 37.3 25.8 52.8 21.5 38.6
85 Claude 4 Sonnet (Reasoning)
Anthropic
2025-05-22 $1,346 34.89 38.6 -25.9 38.7 34.1 43.0 31.3 38.6
86 GPT-5.4 (Non-reasoning)
OpenAI
2026-03-05 $272 7.06 38.5 -26.0 35.4 41.0 39.1 24.3 38.5
87 GPT-5.4 mini (medium)
OpenAI
2026-03-17 $302 7.85 38.5 -26.0 37.7 37.5 40.3 24.8 38.5
88 Ling-2.6-1T
InclusionAI
2026-04-23 $95.0 2.48 38.3 -26.2 33.6 33.0 48.2 19.8 38.3
89 GPT-5.4 nano (medium)
OpenAI
2026-03-17 $91.0 2.38 38.3 -26.2 38.1 35.0 41.6 19.6 38.3
90 GPT-5.1 Codex mini (high)
OpenAI
2025-11-13 $202 5.32 37.9 -26.6 38.6 36.4 38.7 23.0 37.9
91 Nova 2.0 Pro Preview (medium)
Amazon
2025-11-27 $467 12.38 37.7 -26.8 35.7 30.4 47.0 26.7 37.7
92 o3
OpenAI
2025-04-16 $1,025 27.24 37.6 -26.9 38.4 38.4 36.1 30.1 37.6
93 MiniMax-M2
MiniMax
2025-10-26 $118 3.14 37.6 -26.9 36.1 29.2 47.5 20.7 37.6
94 GPT-5 mini (medium)
OpenAI
2025-08-07 $61.9 1.65 37.6 -26.9 38.9 32.9 40.9 17.9 37.6
95 Qwen3.5 35B A3B (Reasoning)
Alibaba
2026-02-24 $302 8.12 37.2 -27.3 37.1 30.3 44.1 24.8 37.2
96 Claude 4.5 Haiku (Reasoning)
Anthropic
2025-10-15 $620 16.92 36.6 -27.9 37.1 32.6 40.2 27.9 36.6
97 Gemini 3 Flash Preview (Non-reasoning)
Google
2025-12-17 $66.0 1.83 36.0 -28.5 35.0 37.8 35.0 18.2 36.0
98 GPT-5.2 (Non-reasoning)
OpenAI
2025-12-11 $225 6.27 35.9 -28.6 33.6 34.7 39.5 23.5 35.9
99 NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA
2026-03-11 $145 4.06 35.8 -28.7 36.0 31.2 40.2 21.6 35.8
100 DeepSeek V3.2 (Non-reasoning)
DeepSeek
2025-12-01 $197 5.55 35.5 -29.0 32.1 34.6 39.8 22.9 35.5