Run #650
error · fetched 2026-06-17 21:00:53 · 9.81 MB raw HTML · 0 models
Open this run in comparison view · JSON results
Worker was canceled or timed out before it could record a failure. raw_chunks=7, model_rows=240
Top quality: Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) (67.5 pts)
| #? | Pareto? | Model? | Released? | Cost$? | $/Q? | Qual? | ΔTop? | Intel? | Code? | Agent? | Pen? | Score? |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ✓ | Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) Anthropic |
2026-06-09 | $6,228 | 92.29 | 67.5 | 0.0 | 59.9 | 62.0 | 80.6 | 37.9 | 67.5 |
| 2 | ✓ | Claude Opus 4.8 (Adaptive Reasoning, Max Effort) Anthropic |
2026-05-28 | $4,012 | 63.27 | 63.4 | -4.1 | 55.7 | 56.7 | 77.8 | 36.0 | 63.4 |
| 3 | ✓ | GPT-5.5 (xhigh) OpenAI |
2026-04-23 | $2,865 | 45.70 | 62.7 | -4.8 | 54.8 | 59.1 | 74.1 | 34.6 | 62.7 |
| 4 | ✓ | GPT-5.5 (high) OpenAI |
2026-04-23 | $1,775 | 29.01 | 61.2 | -6.3 | 53.1 | 58.5 | 72.0 | 32.5 | 61.2 |
| 5 | Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Anthropic |
2026-04-16 | $3,738 | 63.23 | 59.1 | -8.4 | 53.5 | 52.5 | 71.3 | 35.7 | 59.1 | |
| 6 | ✓ | GLM-5.2 (max) Z AI |
2026-06-16 | $868 | 14.69 | 59.1 | -8.4 | 50.7 | 50.7 | 75.9 | 29.4 | 59.1 |
| 7 | Gemini 3.5 Flash (high) |
2026-05-19 | $1,142 | 20.70 | 55.2 | -12.3 | 50.2 | 45.0 | 70.3 | 30.6 | 55.2 | |
| 8 | Qwen3.7 Max Alibaba |
2026-05-19 | $1,432 | 26.42 | 54.2 | -13.3 | 46.0 | 50.1 | 66.6 | 31.6 | 54.2 | |
| 9 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) Anthropic |
2026-02-17 | $3,356 | 62.47 | 53.7 | -13.8 | 47.2 | 50.9 | 63.0 | 35.3 | 53.7 | |
| 10 | ✓ | Gemini 3.1 Pro Preview |
2026-02-19 | $860 | 16.02 | 53.7 | -13.8 | 46.5 | 55.5 | 59.1 | 29.3 | 53.7 |
| 11 | ✓ | DeepSeek V4 Pro (Reasoning, Max Effort) DeepSeek |
2026-04-24 | $180 | 3.39 | 53.0 | -14.5 | 44.3 | 47.5 | 67.2 | 22.5 | 53.0 |
| 12 | MiniMax-M3 MiniMax |
2026-06-01 | $235 | 4.51 | 52.2 | -15.3 | 44.4 | 43.4 | 68.6 | 23.7 | 52.2 | |
| 13 | Kimi K2.6 Kimi |
2026-04-20 | $839 | 16.13 | 52.0 | -15.5 | 42.8 | 47.1 | 66.0 | 29.2 | 52.0 | |
| 14 | ✓ | MiMo-V2.5-Pro Xiaomi |
2026-04-22 | $99.1 | 1.92 | 51.7 | -15.7 | 42.2 | 45.5 | 67.4 | 20.0 | 51.7 |
| 15 | GLM-5.1 (Reasoning) Z AI |
2026-04-07 | $674 | 13.43 | 50.2 | -17.3 | 40.2 | 43.4 | 67.1 | 28.3 | 50.2 | |
| 16 | Qwen3.7 Plus Alibaba |
2026-06-01 | $152 | 3.03 | 50.2 | -17.3 | 39.0 | 46.5 | 65.1 | 21.8 | 50.2 | |
| 17 | GPT-5.4 mini (xhigh) OpenAI |
2026-03-17 | $1,158 | 23.10 | 50.1 | -17.4 | 40.0 | 51.5 | 58.9 | 30.6 | 50.1 | |
| 18 | Kimi K2.7 Code Kimi |
2026-06-12 | $530 | 10.64 | 49.8 | -17.6 | 41.9 | 45.6 | 61.9 | 27.2 | 49.8 | |
| 19 | Grok 4.3 (high) xAI |
2026-04-30 | $319 | 6.61 | 48.2 | -19.3 | 37.6 | 41.0 | 65.9 | 25.0 | 48.2 | |
| 20 | Qwen3.6 Plus Alibaba |
2026-04-02 | $484 | 10.08 | 48.0 | -19.4 | 39.6 | 42.9 | 61.7 | 26.9 | 48.0 | |
| 21 | MiniMax-M2.7 MiniMax |
2026-03-18 | $144 | 3.05 | 47.2 | -20.3 | 38.1 | 41.9 | 61.5 | 21.6 | 47.2 | |
| 22 | ✓ | DeepSeek V4 Flash (Reasoning, Max Effort) DeepSeek |
2026-04-24 | $78.4 | 1.68 | 46.8 | -20.7 | 40.3 | 38.7 | 61.3 | 18.9 | 46.8 |
| 23 | Qwen3.6 27B (Reasoning) Alibaba |
2026-04-22 | $665 | 14.62 | 45.5 | -22.0 | 37.1 | 36.5 | 62.9 | 28.2 | 45.5 | |
| 24 | Nemotron 3 Ultra 550B A55B (Reasoning) NVIDIA |
2026-06-04 | $444 | 10.05 | 44.1 | -23.4 | 37.8 | 37.6 | 57.1 | 26.5 | 44.1 | |
| 25 | Qwen3.5 397B A17B (Reasoning) Alibaba |
2026-02-16 | $528 | 12.11 | 43.6 | -23.9 | 33.7 | 41.3 | 55.8 | 27.2 | 43.6 | |
| 26 | GPT-5.4 nano (xhigh) OpenAI |
2026-03-17 | $289 | 6.67 | 43.3 | -24.2 | 38.2 | 43.9 | 47.6 | 24.6 | 43.3 | |
| 27 | Step 3.7 Flash StepFun |
2026-05-29 | $320 | 7.61 | 42.1 | -25.4 | 29.7 | 37.1 | 59.5 | 25.1 | 42.1 | |
| 28 | Qwen3.6 35B A3B (Reasoning) Alibaba |
2026-04-16 | $333 | 7.97 | 41.7 | -25.8 | 31.6 | 35.2 | 58.3 | 25.2 | 41.7 | |
| 29 | Qwen3.5 122B A10B (Reasoning) Alibaba |
2026-02-24 | $446 | 11.14 | 40.0 | -27.5 | 32.3 | 34.7 | 53.0 | 26.5 | 40.0 | |
| 30 | Mistral Medium 3.5 Mistral |
2026-04-29 | $1,325 | 33.53 | 39.5 | -28.0 | 29.9 | 35.4 | 53.2 | 31.2 | 39.5 | |
| 31 | Claude 4.5 Haiku (Reasoning) Anthropic |
2025-10-15 | $539 | 15.80 | 34.1 | -33.4 | 29.6 | 32.6 | 40.2 | 27.3 | 34.1 | |
| 32 | Nova 2.0 Pro Preview (medium) Amazon |
2025-11-27 | $407 | 12.31 | 33.0 | -34.4 | 21.8 | 30.4 | 47.0 | 26.1 | 33.0 | |
| 33 | Grok 4.3 (Non-reasoning) xAI |
2026-04-30 | $297 | 9.04 | 32.9 | -34.6 | 24.8 | 25.1 | 48.8 | 24.7 | 32.9 | |
| 34 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) NVIDIA |
2026-03-11 | $284 | 8.81 | 32.3 | -35.2 | 25.4 | 31.2 | 40.2 | 24.5 | 32.3 | |
| 35 | gpt-oss-120b (high) OpenAI |
2025-08-05 | $96.3 | 3.20 | 30.1 | -37.4 | 23.8 | 28.6 | 37.9 | 19.8 | 30.1 | |
| 36 | Gemini 3.1 Flash-Lite |
2026-03-03 | $94.8 | 3.52 | 26.9 | -40.5 | 25.0 | 30.1 | 25.7 | 19.8 | 26.9 | |
| 37 | ✓ | Gemma 4 26B A4B (Reasoning) |
2026-04-02 | $54.5 | 2.04 | 26.8 | -40.7 | 25.7 | 22.4 | 32.1 | 17.4 | 26.8 |
| 38 | ✓ | gpt-oss-20B (high) OpenAI |
2025-08-05 | $29.9 | 1.47 | 20.3 | -47.1 | 14.9 | 18.5 | 27.6 | 14.8 | 20.3 |