Run #1
success · fetched 2026-05-11 23:17:33 · 7.54 MB raw HTML · 515 models
Open this run in comparison view · Download raw HTML · JSON results
Top quality: GPT-5.5 (xhigh) (64.5 pts)
| #? | Pareto? | Model? | Released? | Cost$? | $/Q? | Qual? | ΔTop? | Intel? | Code? | Agent? | Pen? | Score? |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ✓ | GPT-5.5 (xhigh) OpenAI |
2026-04-23 | $3,357 | 52.05 | 64.5 | 0.0 | 60.2 | 59.1 | 74.1 | 35.3 | 64.5 |
| 2 | ✓ | GPT-5.5 (high) OpenAI |
2026-04-23 | $2,159 | 34.21 | 63.1 | -1.4 | 58.9 | 58.5 | 72.0 | 33.3 | 63.1 |
| 3 | ✓ | GPT-5.5 (medium) OpenAI |
2026-04-23 | $1,199 | 19.73 | 60.8 | -3.7 | 56.7 | 56.2 | 69.4 | 30.8 | 60.8 |
| 4 | ✓ | Gemini 3.1 Pro Preview |
2026-02-19 | $892 | 15.58 | 57.3 | -7.2 | 57.2 | 55.5 | 59.1 | 29.5 | 57.3 |
| 5 | ✓ | MiMo-V2.5-Pro Xiaomi |
2026-04-22 | $462 | 8.30 | 55.6 | -8.9 | 53.8 | 45.5 | 67.4 | 26.6 | 55.6 |
| 6 | ✓ | Grok 4.3 xAI |
2026-04-30 | $395 | 7.40 | 53.4 | -11.1 | 53.2 | 41.0 | 65.9 | 26.0 | 53.4 |
| 7 | ✓ | MiMo-V2.5 Xiaomi |
2026-04-22 | $207 | 3.97 | 52.2 | -12.3 | 49.0 | 42.1 | 65.5 | 23.2 | 52.2 |
| 8 | ✓ | MiniMax-M2.7 MiniMax |
2026-03-18 | $176 | 3.44 | 51.0 | -13.5 | 49.6 | 41.9 | 61.5 | 22.4 | 51.0 |
| 9 | ✓ | DeepSeek V4 Flash (Reasoning, Max Effort) DeepSeek |
2026-04-24 | $113 | 2.31 | 48.8 | -15.7 | 46.5 | 38.7 | 61.3 | 20.5 | 48.8 |
| 10 | ✓ | DeepSeek V4 Flash (Reasoning, High Effort) DeepSeek |
2026-04-24 | $57.2 | 1.20 | 47.5 | -17.0 | 44.9 | 39.8 | 57.8 | 17.6 | 47.5 |
| 11 | ✓ | DeepSeek V4 Flash (Non-reasoning) DeepSeek |
2026-04-24 | $40.0 | 0.90 | 44.3 | -20.2 | 36.5 | 35.1 | 61.3 | 16.0 | 44.3 |
| 12 | ✓ | Grok 4.1 Fast (Reasoning) xAI |
2025-11-19 | $39.6 | 1.00 | 39.6 | -24.9 | 38.6 | 30.9 | 49.3 | 16.0 | 39.6 |
| 13 | ✓ | MiMo-V2-Flash (Non-reasoning) Xiaomi |
2025-12-16 | $21.4 | 0.62 | 34.5 | -30.0 | 30.4 | 25.8 | 47.3 | 13.3 | 34.5 |
| 14 | ✓ | Grok 4.1 Fast (Non-reasoning) xAI |
2025-11-19 | $21.4 | 0.84 | 25.3 | -39.2 | 23.6 | 19.5 | 33.0 | 13.3 | 25.3 |
| 15 | ✓ | Grok 4 Fast (Non-reasoning) xAI |
2025-09-19 | $17.4 | 0.71 | 24.7 | -39.8 | 23.1 | 19.0 | 31.9 | 12.4 | 24.7 |
| 16 | ✓ | gpt-oss-120B (low) OpenAI |
2025-08-05 | $15.9 | 0.70 | 22.7 | -41.8 | 24.5 | 15.5 | 28.0 | 12.0 | 22.7 |
| 17 | ✓ | gpt-oss-20B (low) OpenAI |
2025-08-05 | $7.68 | 0.40 | 19.0 | -45.5 | 20.8 | 14.4 | 21.9 | 8.9 | 19.0 |
| 18 | ✓ | Qwen3.5 0.8B (Non-reasoning) Alibaba |
2026-03-02 | $6.67 | 0.61 | 10.9 | -53.6 | 9.9 | 1.0 | 21.7 | 8.2 | 10.9 |
| 19 | ✓ | Granite 4.0 H Small IBM |
2025-09-22 | $4.48 | 0.54 | 8.4 | -56.1 | 10.8 | 8.5 | 5.8 | 6.5 | 8.4 |
| 20 | ✓ | Phi-4 Microsoft |
2024-12-12 | $4.27 | 0.59 | 7.2 | -57.3 | 10.4 | 11.2 | 0.0 | 6.3 | 7.2 |
| 21 | ✓ | Apertus 70B Instruct Swiss AI Initiative |
2025-09-02 | $3.78 | 0.82 | 4.6 | -59.9 | 7.7 | 1.9 | 4.3 | 5.8 | 4.6 |
| 22 | ✓ | Apertus 8B Instruct Swiss AI Initiative |
2025-09-02 | $0.10 | 0.03 | 3.7 | -60.8 | 5.9 | 1.4 | 3.8 | 0.0 | 3.7 |