Model Accuracy vs Pass@3
100%
75%
50%
25%
0%
Gemini-3-Pro-Preview

Solar Pro 3 (Round 2)
GPT-5.2 (high)

K-EXAONE-236B-A23B

Kanana-2-30B-Thinking-2601
lgai/k-exaone-236b-a23b
Accuracy
Pass@3
Avg Token Usage (Per Problem)
20.8K
15.6K
10.4K
5.2K
0

K-EXAONE-236B-A23B
lgai/k-exaone-236b-a23b

Solar Pro 3 (Round 2)
Gemini-3-Pro-Preview

Kanana-2-30B-Thinking-2601
GPT-5.2 (high)
Avg Tokens / Problem
EntropyMath is an evolutionary multi-agent system and benchmark that generates high-entropy math problems designed to systematically break current LLMs.
Results are reported using Pass@3 metrics to account for generation variance, alongside detailed execution traces for transparency.
Performance Legend
Mastery (100%)
3/3
Strong (66%)
2/3
Weak (33%)
1/3
Fail (0%)
0/3


