IMDS LogoCicagolab LogoDeep Fountain Logo

EntropyMath Leaderboard

A high-entropy mathematical reasoning benchmark for LLMs

Model Accuracy vs Pass@3

100%
75%
50%
25%
0%
Gemini-3-Pro-Preview
Solar Pro 3 (Round 2)
Solar Pro 3 (Round 2)
GPT-5.2 (high)
K-EXAONE-236B-A23B
K-EXAONE-236B-A23B
Kanana-2-30B-Thinking-2601
Kanana-2-30B-Thinking-2601
lgai/k-exaone-236b-a23b
Accuracy
Pass@3

Avg Token Usage (Per Problem)

20.8K
15.6K
10.4K
5.2K
0
K-EXAONE-236B-A23B
K-EXAONE-236B-A23B
lgai/k-exaone-236b-a23b
Solar Pro 3 (Round 2)
Solar Pro 3 (Round 2)
Gemini-3-Pro-Preview
Kanana-2-30B-Thinking-2601
Kanana-2-30B-Thinking-2601
GPT-5.2 (high)
Avg Tokens / Problem

EntropyMath is an evolutionary multi-agent system and benchmark that generates high-entropy math problems designed to systematically break current LLMs.

Results are reported using Pass@3 metrics to account for generation variance, alongside detailed execution traces for transparency.

Performance Legend

Mastery (100%)
3/3
Strong (66%)
2/3
Weak (33%)
1/3
Fail (0%)
0/3
ModelAccPass@301234
API / Others
Gemini-3-Pro-Preview
60.080.02/33/33/30/31/3
GPT-5.2 (high)
33.360.03/31/31/30/30/3
K-LLM Project Round 2
Solar Pro 3 (Round 2)Solar Pro 3 (Round 2)
40.040.03/30/33/30/30/3
K-EXAONE-236B-A23BK-EXAONE-236B-A23B
13.340.00/31/30/31/30/3
Local - KR
Kanana-2-30B-Thinking-2601Kanana-2-30B-Thinking-2601
13.340.01/30/31/30/30/3