Run #193
success · fetched 2026-05-19 23:00:08 · 7.65 MB raw HTML · 518 models
Top quality: GPT-5.5 (xhigh) (64.5 pts)
| #? | Pareto? | Model? | Released? | Cost$? | $/Q? | Qual? | ΔTop? | Intel? | Code? | Agent? | Pen? | Score? |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ✓ | GPT-5.5 (xhigh) OpenAI |
2026-04-23 | $3,357 | 52.05 | 64.5 | 0.0 | 60.2 | 59.1 | 74.1 | 35.3 | 64.5 |
| 2 | ✓ | GPT-5.5 (high) OpenAI |
2026-04-23 | $2,159 | 34.21 | 63.1 | -1.4 | 58.9 | 58.5 | 72.0 | 33.3 | 63.1 |
| 3 | ✓ | GPT-5.5 (medium) OpenAI |
2026-04-23 | $1,199 | 19.73 | 60.8 | -3.7 | 56.7 | 56.2 | 69.4 | 30.8 | 60.8 |
| 4 | GPT-5.4 (xhigh) OpenAI |
2026-03-05 | $2,851 | 46.99 | 60.7 | -3.8 | 56.8 | 57.2 | 68.0 | 34.5 | 60.7 | |
| 5 | Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Anthropic |
2026-04-16 | $5,117 | 84.78 | 60.4 | -4.1 | 57.3 | 52.5 | 71.3 | 37.1 | 60.4 | |
| 6 | ✓ | Gemini 3.1 Pro Preview |
2026-02-19 | $892 | 15.59 | 57.3 | -7.2 | 57.2 | 55.5 | 59.1 | 29.5 | 57.3 |
| 7 | Gemini 3.5 Flash (high) |
2026-05-19 | $1,552 | 27.28 | 56.9 | -7.6 | 55.3 | 45.0 | 70.3 | 31.9 | 56.9 | |
| 8 | Claude Opus 4.7 (Non-reasoning, High Effort) Anthropic |
2026-04-16 | $1,217 | 21.54 | 56.5 | -8.0 | 51.8 | 53.1 | 64.6 | 30.9 | 56.5 | |
| 9 | Claude Opus 4.6 (Adaptive Reasoning, Max Effort) Anthropic |
2026-02-05 | $5,231 | 93.07 | 56.2 | -8.3 | 52.9 | 48.1 | 67.6 | 37.2 | 56.2 | |
| 10 | GPT-5.3 Codex (xhigh) OpenAI |
2026-02-05 | $1,572 | 28.20 | 55.7 | -8.8 | 53.6 | 53.1 | 60.5 | 32.0 | 55.7 | |
| 11 | Kimi K2.6 Kimi |
2026-04-20 | $948 | 17.03 | 55.7 | -8.8 | 53.9 | 47.1 | 66.0 | 29.8 | 55.7 | |
| 12 | ✓ | MiMo-V2.5-Pro Xiaomi |
2026-04-22 | $462 | 8.30 | 55.6 | -8.9 | 53.8 | 45.5 | 67.4 | 26.6 | 55.6 |
| 13 | DeepSeek V4 Pro (Reasoning, Max Effort) DeepSeek |
2026-04-24 | $1,071 | 19.34 | 55.4 | -9.1 | 51.5 | 47.5 | 67.2 | 30.3 | 55.4 | |
| 14 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) Anthropic |
2026-02-17 | $4,206 | 76.17 | 55.2 | -9.3 | 51.7 | 50.9 | 63.0 | 36.2 | 55.2 | |
| 15 | GPT-5.5 (low) OpenAI |
2026-04-23 | $501 | 9.24 | 54.2 | -10.3 | 50.8 | 52.1 | 59.7 | 27.0 | 54.2 | |
| 16 | Qwen3.6 Max Preview Alibaba |
2026-04-20 | $861 | 15.98 | 53.9 | -10.6 | 51.8 | 44.9 | 64.8 | 29.3 | 53.9 | |
| 17 | GPT-5.2 (xhigh) OpenAI |
2025-12-11 | $2,304 | 43.16 | 53.4 | -11.1 | 51.3 | 48.7 | 60.2 | 33.6 | 53.4 | |
| 18 | ✓ | Grok 4.3 (high) xAI |
2026-04-30 | $395 | 7.40 | 53.4 | -11.1 | 53.2 | 41.0 | 65.9 | 26.0 | 53.4 |
| 19 | DeepSeek V4 Pro (Reasoning, High Effort) DeepSeek |
2026-04-24 | $690 | 12.97 | 53.2 | -11.3 | 49.8 | 43.2 | 66.7 | 28.4 | 53.2 | |
| 20 | GPT-5.4 mini (xhigh) OpenAI |
2026-03-17 | $1,354 | 25.50 | 53.1 | -11.4 | 48.9 | 51.5 | 58.9 | 31.3 | 53.1 | |
| 21 | Claude Opus 4.6 (Non-reasoning, High Effort) Anthropic |
2026-02-05 | $1,746 | 33.10 | 52.7 | -11.7 | 46.5 | 47.6 | 64.2 | 32.4 | 52.7 | |
| 22 | Claude Opus 4.5 (Reasoning) Anthropic |
2025-11-24 | $2,969 | 56.66 | 52.4 | -12.1 | 49.7 | 47.8 | 59.6 | 34.7 | 52.4 | |
| 23 | GLM-5 (Reasoning) Z AI |
2026-02-11 | $547 | 10.45 | 52.4 | -12.1 | 49.8 | 44.2 | 63.1 | 27.4 | 52.4 | |
| 24 | ✓ | MiMo-V2.5 Xiaomi |
2026-04-22 | $207 | 3.97 | 52.2 | -12.3 | 49.0 | 42.1 | 65.5 | 23.2 | 52.2 |
| 25 | Qwen3.6 Plus Alibaba |
2026-04-02 | $483 | 9.37 | 51.5 | -13.0 | 50.0 | 42.9 | 61.7 | 26.8 | 51.5 | |
| 26 | MiMo-V2-Pro Xiaomi |
2026-03-18 | $351 | 6.86 | 51.1 | -13.3 | 49.2 | 41.4 | 62.8 | 25.5 | 51.1 | |
| 27 | ✓ | MiniMax-M2.7 MiniMax |
2026-03-18 | $176 | 3.44 | 51.0 | -13.5 | 49.6 | 41.9 | 61.5 | 22.4 | 51.0 |
| 28 | Claude Sonnet 4.6 (Non-reasoning, High Effort) Anthropic |
2026-02-17 | $1,694 | 33.34 | 50.8 | -13.7 | 44.4 | 46.4 | 61.6 | 32.3 | 50.8 | |
| 29 | GPT-5.4 (low) OpenAI |
2026-03-05 | $413 | 8.17 | 50.6 | -13.9 | 47.9 | 45.6 | 58.2 | 26.2 | 50.6 | |
| 30 | GPT-5.2 Codex (xhigh) OpenAI |
2025-12-11 | $3,244 | 65.54 | 49.5 | -15.0 | 49.0 | 43.0 | 56.5 | 35.1 | 49.5 | |
| 31 | ✓ | DeepSeek V4 Flash (Reasoning, High Effort) DeepSeek |
2026-04-24 | $57.4 | 1.16 | 49.4 | -15.1 | 46.0 | 39.8 | 62.3 | 17.6 | 49.4 |
| 32 | Gemini 3 Pro Preview (high) |
2025-11-18 | $820 | 16.75 | 49.0 | -15.5 | 48.4 | 46.5 | 52.0 | 29.1 | 49.0 | |
| 33 | DeepSeek V4 Flash (Reasoning, Max Effort) DeepSeek |
2026-04-24 | $113 | 2.31 | 48.8 | -15.7 | 46.5 | 38.7 | 61.3 | 20.5 | 48.8 | |
| 34 | GPT-5.2 (medium) OpenAI |
2025-12-11 | $700 | 14.40 | 48.6 | -15.9 | 46.6 | 44.2 | 54.9 | 28.4 | 48.6 | |
| 35 | GLM-5.1 (Non-reasoning) Z AI |
2026-04-07 | $618 | 12.72 | 48.5 | -15.9 | 43.8 | 35.8 | 66.0 | 27.9 | 48.5 | |
| 36 | Kimi K2.5 (Reasoning) Kimi |
2026-01-27 | $354 | 7.30 | 48.4 | -16.1 | 46.8 | 39.6 | 58.9 | 25.5 | 48.4 | |
| 37 | Claude Opus 4.5 (Non-reasoning) Anthropic |
2025-11-24 | $1,392 | 28.75 | 48.4 | -16.1 | 43.1 | 42.9 | 59.2 | 31.4 | 48.4 | |
| 38 | Qwen3.6 27B (Reasoning) Alibaba |
2026-04-22 | $659 | 13.61 | 48.4 | -16.1 | 45.8 | 36.5 | 62.9 | 28.2 | 48.4 | |
| 39 | GPT-5.1 (high) OpenAI |
2025-11-13 | $779 | 16.26 | 47.9 | -16.6 | 47.7 | 44.7 | 51.3 | 28.9 | 47.9 | |
| 40 | Grok 4.20 0309 v2 (Reasoning) xAI |
2026-04-07 | $514 | 10.74 | 47.9 | -16.6 | 49.3 | 40.5 | 53.9 | 27.1 | 47.9 | |
| 41 | Claude Sonnet 4.6 (Non-reasoning, Low Effort) Anthropic |
2026-02-17 | $666 | 13.97 | 47.7 | -16.8 | 42.6 | 43.0 | 57.5 | 28.2 | 47.7 | |
| 42 | Qwen3.5 397B A17B (Reasoning) Alibaba |
2026-02-16 | $418 | 8.81 | 47.4 | -17.1 | 45.0 | 41.3 | 55.8 | 26.2 | 47.4 | |
| 43 | Grok 4.20 0309 (Reasoning) xAI |
2026-03-10 | $484 | 10.27 | 47.2 | -17.3 | 48.5 | 42.2 | 50.9 | 26.9 | 47.2 | |
| 44 | DeepSeek V4 Pro (Non-reasoning) DeepSeek |
2026-04-24 | $616 | 13.11 | 47.0 | -17.5 | 39.3 | 38.4 | 63.3 | 27.9 | 47.0 | |
| 45 | MiMo-V2-Omni-0327 Xiaomi |
2026-03-27 | $218 | 4.66 | 46.8 | -17.7 | 44.9 | 36.9 | 58.6 | 23.4 | 46.8 | |
| 46 | KAT Coder Pro V2 KwaiKAT |
2026-03-27 | $73.5 | 1.57 | 46.7 | -17.8 | 43.8 | 45.6 | 50.7 | 18.7 | 46.7 | |
| 47 | Kimi K2.6 (Non-reasoning) Kimi |
2026-04-20 | $505 | 10.81 | 46.7 | -17.8 | 42.9 | 38.4 | 58.7 | 27.0 | 46.7 | |
| 48 | GLM-5 (Non-reasoning) Z AI |
2026-02-11 | $240 | 5.15 | 46.6 | -17.9 | 40.6 | 39.0 | 60.3 | 23.8 | 46.6 | |
| 49 | GPT-5.5 (Non-reasoning) OpenAI |
2026-04-23 | $361 | 7.74 | 46.6 | -17.9 | 40.9 | 48.6 | 50.2 | 25.6 | 46.6 | |
| 50 | Gemini 3 Flash Preview (Reasoning) |
2025-12-17 | $278 | 6.02 | 46.2 | -18.3 | 46.4 | 42.6 | 49.7 | 24.4 | 46.2 | |
| 51 | Qwen3.6 35B A3B (Reasoning) Alibaba |
2026-04-16 | $280 | 6.13 | 45.7 | -18.8 | 43.5 | 35.2 | 58.3 | 24.5 | 45.7 | |
| 52 | GPT-5 Codex (high) OpenAI |
2025-09-23 | $995 | 21.91 | 45.4 | -19.1 | 44.6 | 38.9 | 52.7 | 30.0 | 45.4 | |
| 53 | GPT-5.4 nano (xhigh) OpenAI |
2026-03-17 | $363 | 8.04 | 45.2 | -19.3 | 44.0 | 43.9 | 47.6 | 25.6 | 45.2 | |
| 54 | GPT-5 (high) OpenAI |
2025-08-07 | $913 | 20.23 | 45.1 | -19.4 | 44.6 | 36.0 | 54.7 | 29.6 | 45.1 | |
| 55 | MiniMax-M2.5 MiniMax |
2026-02-12 | $125 | 2.77 | 45.0 | -19.5 | 41.9 | 37.4 | 55.6 | 21.0 | 45.0 | |
| 56 | Hy3-preview (Reasoning) Tencent |
2026-04-23 | $84.4 | 1.89 | 44.7 | -19.8 | 41.9 | 36.5 | 55.7 | 19.3 | 44.7 | |
| 57 | Claude 4.5 Sonnet (Reasoning) Anthropic |
2025-09-29 | $1,585 | 35.65 | 44.5 | -20.0 | 43.0 | 38.6 | 51.7 | 32.0 | 44.5 | |
| 58 | GLM-4.7 (Reasoning) Z AI |
2025-12-22 | $478 | 10.75 | 44.5 | -20.0 | 42.1 | 36.3 | 55.0 | 26.8 | 44.5 | |
| 59 | ✓ | DeepSeek V4 Flash (Non-reasoning) DeepSeek |
2026-04-24 | $40.0 | 0.90 | 44.3 | -20.2 | 36.5 | 35.2 | 61.3 | 16.0 | 44.3 |
| 60 | Qwen3.5 27B (Reasoning) Alibaba |
2026-02-24 | $299 | 6.82 | 43.8 | -20.6 | 42.1 | 34.9 | 54.6 | 24.8 | 43.8 | |
| 61 | DeepSeek V3.2 (Reasoning) DeepSeek |
2025-12-01 | $75.7 | 1.73 | 43.8 | -20.7 | 41.7 | 36.7 | 52.9 | 18.8 | 43.8 | |
| 62 | Qwen3.5 397B A17B (Non-reasoning) Alibaba |
2026-02-16 | $186 | 4.27 | 43.6 | -20.9 | 40.1 | 37.4 | 53.3 | 22.7 | 43.6 | |
| 63 | GPT-5.1 Codex (high) OpenAI |
2025-11-13 | $892 | 20.52 | 43.5 | -21.0 | 43.1 | 36.6 | 50.7 | 29.5 | 43.5 | |
| 64 | Qwen3.5 122B A10B (Reasoning) Alibaba |
2026-02-24 | $354 | 8.21 | 43.1 | -21.4 | 41.6 | 34.7 | 53.0 | 25.5 | 43.1 | |
| 65 | Mistral Medium 3.5 Mistral |
2026-04-29 | $1,001 | 23.50 | 42.6 | -21.9 | 39.2 | 35.4 | 53.2 | 30.0 | 42.6 | |
| 66 | GPT-5 (medium) OpenAI |
2025-08-07 | $552 | 13.05 | 42.3 | -22.2 | 42.0 | 38.9 | 45.8 | 27.4 | 42.3 | |
| 67 | Gemini 3 Pro Preview (low) |
2025-11-18 | $355 | 8.47 | 41.9 | -22.6 | 41.3 | 39.4 | 45.0 | 25.5 | 41.9 | |
| 68 | Qwen3.6 27B (Non-reasoning) Alibaba |
2026-04-22 | $234 | 5.64 | 41.5 | -23.0 | 37.1 | 26.6 | 60.9 | 23.7 | 41.5 | |
| 69 | MiMo-V2-Flash (Feb 2026) Xiaomi |
2025-12-16 | $66.5 | 1.61 | 41.2 | -23.3 | 41.5 | 33.5 | 48.8 | 18.2 | 41.2 | |
| 70 | Kimi K2 Thinking Kimi |
2025-11-06 | $308 | 7.48 | 41.2 | -23.3 | 40.9 | 34.8 | 47.9 | 24.9 | 41.2 | |
| 71 | Grok 4 xAI |
2025-07-10 | $2,881 | 69.98 | 41.2 | -23.3 | 41.5 | 40.5 | 41.5 | 34.6 | 41.2 | |
| 72 | MiMo-V2-Flash (Reasoning) Xiaomi |
2025-12-16 | $47.5 | 1.16 | 41.0 | -23.4 | 39.2 | 31.8 | 52.1 | 16.8 | 41.0 | |
| 73 | MiMo-V2.5-Pro (Non-reasoning) Xiaomi |
2026-04-22 | $703 | 17.14 | 41.0 | -23.5 | 35.6 | 36.8 | 50.8 | 28.5 | 41.0 | |
| 74 | Qwen3.5 27B (Non-reasoning) Alibaba |
2026-02-24 | $122 | 2.99 | 40.7 | -23.8 | 37.2 | 33.4 | 51.5 | 20.9 | 40.7 | |
| 75 | GPT-5 mini (high) OpenAI |
2025-08-07 | $168 | 4.14 | 40.7 | -23.8 | 41.2 | 35.3 | 45.5 | 22.3 | 40.7 | |
| 76 | Step 3.5 Flash StepFun |
2026-02-02 | $74.8 | 1.85 | 40.5 | -24.0 | 37.8 | 31.6 | 52.0 | 18.7 | 40.5 | |
| 77 | Claude 4.5 Sonnet (Non-reasoning) Anthropic |
2025-09-29 | $827 | 20.46 | 40.4 | -24.1 | 37.1 | 33.5 | 50.6 | 29.2 | 40.4 | |
| 78 | GLM-4.7 (Non-reasoning) Z AI |
2025-12-22 | $147 | 3.66 | 40.2 | -24.3 | 34.2 | 32.0 | 54.3 | 21.7 | 40.2 | |
| 79 | Qwen3 Max Thinking Alibaba |
2026-01-26 | $669 | 16.65 | 40.2 | -24.3 | 39.8 | 30.5 | 50.1 | 28.3 | 40.2 | |
| 80 | GPT-5 (low) OpenAI |
2025-08-07 | $228 | 5.71 | 39.9 | -24.6 | 39.2 | 30.7 | 49.7 | 23.6 | 39.9 | |
| 81 | MiniMax-M2.1 MiniMax |
2025-12-23 | $114 | 2.87 | 39.9 | -24.6 | 39.4 | 32.8 | 47.4 | 20.6 | 39.9 | |
| 82 | Qwen3.5 Omni Plus Alibaba |
2026-03-30 | $150 | 3.77 | 39.7 | -24.8 | 38.6 | 27.6 | 52.8 | 21.8 | 39.7 | |
| 83 | Qwen3.5 122B A10B (Non-reasoning) Alibaba |
2026-02-24 | $166 | 4.27 | 39.0 | -25.5 | 35.9 | 31.6 | 49.5 | 22.2 | 39.0 | |
| 84 | Kimi K2.5 (Non-reasoning) Kimi |
2026-01-27 | $141 | 3.65 | 38.6 | -25.8 | 37.3 | 25.8 | 52.8 | 21.5 | 38.6 | |
| 85 | Claude 4 Sonnet (Reasoning) Anthropic |
2025-05-22 | $1,349 | 34.97 | 38.6 | -25.9 | 38.7 | 34.1 | 43.0 | 31.3 | 38.6 | |
| 86 | GPT-5.4 (Non-reasoning) OpenAI |
2026-03-05 | $272 | 7.06 | 38.5 | -26.0 | 35.4 | 41.0 | 39.1 | 24.3 | 38.5 | |
| 87 | GPT-5.4 mini (medium) OpenAI |
2026-03-17 | $302 | 7.85 | 38.5 | -26.0 | 37.7 | 37.5 | 40.3 | 24.8 | 38.5 | |
| 88 | Ling-2.6-1T InclusionAI |
2026-04-23 | $95.0 | 2.48 | 38.3 | -26.2 | 33.6 | 33.1 | 48.2 | 19.8 | 38.3 | |
| 89 | GPT-5.4 nano (medium) OpenAI |
2026-03-17 | $90.6 | 2.37 | 38.3 | -26.2 | 38.1 | 35.0 | 41.6 | 19.6 | 38.3 | |
| 90 | ✓ | Hy3-preview (Non-reasoning) Tencent |
2026-04-23 | $36.1 | 0.94 | 38.2 | -26.3 | 33.7 | 34.3 | 46.7 | 15.6 | 38.2 |
| 91 | GPT-5.1 Codex mini (high) OpenAI |
2025-11-13 | $202 | 5.32 | 37.9 | -26.6 | 38.6 | 36.4 | 38.7 | 23.0 | 37.9 | |
| 92 | Nova 2.0 Pro Preview (medium) Amazon |
2025-11-27 | $467 | 12.38 | 37.7 | -26.8 | 35.7 | 30.4 | 47.0 | 26.7 | 37.7 | |
| 93 | o3 OpenAI |
2025-04-16 | $1,025 | 27.25 | 37.6 | -26.9 | 38.4 | 38.4 | 36.1 | 30.1 | 37.6 | |
| 94 | MiniMax-M2 MiniMax |
2025-10-26 | $116 | 3.08 | 37.6 | -26.9 | 36.1 | 29.2 | 47.5 | 20.6 | 37.6 | |
| 95 | GPT-5 mini (medium) OpenAI |
2025-08-07 | $61.9 | 1.65 | 37.6 | -26.9 | 38.9 | 32.8 | 40.9 | 17.9 | 37.6 | |
| 96 | Qwen3.5 35B A3B (Reasoning) Alibaba |
2026-02-24 | $302 | 8.12 | 37.2 | -27.3 | 37.1 | 30.3 | 44.1 | 24.8 | 37.2 | |
| 97 | Claude 4.5 Haiku (Reasoning) Anthropic |
2025-10-15 | $620 | 16.92 | 36.6 | -27.9 | 37.1 | 32.6 | 40.2 | 27.9 | 36.6 | |
| 98 | Gemini 3 Flash Preview (Non-reasoning) |
2025-12-17 | $66.0 | 1.83 | 36.0 | -28.5 | 35.0 | 37.8 | 35.0 | 18.2 | 36.0 | |
| 99 | GPT-5.2 (Non-reasoning) OpenAI |
2025-12-11 | $225 | 6.28 | 35.9 | -28.6 | 33.6 | 34.7 | 39.5 | 23.5 | 35.9 | |
| 100 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) NVIDIA |
2026-03-11 | $140 | 3.92 | 35.8 | -28.7 | 36.0 | 31.2 | 40.2 | 21.5 | 35.8 |