Seed Problems (EntropyMath Standard v2)
Model Accuracy vs Pass@3
100%
75%
50%
25%
0%
GPT-5.2 (high)
Gemini-3-Pro-Preview

Solar-Pro 2

Kanana-2-30B-Thinking

K-EXAONE-236B-A23B

Kanana-2-30B-Thinking-2601
GLM-4.5-Air

Solar-Open-100B
model_d_r1
naver-hyperclovax/HCX-007

EXAONE-4.0-32B
axk1
Accuracy
Pass@3
Avg Token Usage (Per Problem)
25.2K
18.9K
12.6K
6.3K
0

K-EXAONE-236B-A23B

Solar-Open-100B
Gemini-3-Pro-Preview

Kanana-2-30B-Thinking-2601

Kanana-2-30B-Thinking

Solar-Pro 2
GPT-5.2 (high)
GLM-4.5-Air
naver-hyperclovax/HCX-007

EXAONE-4.0-32B
axk1
model_d_r1
Avg Tokens / Problem
EntropyMath is an evolutionary multi-agent system and benchmark that generates high-entropy math problems designed to systematically break current LLMs.
Results are reported using Pass@3 metrics to account for generation variance, alongside detailed execution traces for transparency.
Performance Legend
Mastery (100%)
3/3
Strong (66%)
2/3
Weak (33%)
1/3
Fail (0%)
0/3
| Model | Acc | Pass@3 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| API / Others | ||||||||||||
GPT-5.2 (high) | 86.7 | 90.0 | 3/3 | 3/3 | 3/3 | 2/3 | 3/3 | 3/3 | 3/3 | 3/3 | 0/3 | 3/3 |
Gemini-3-Pro-Preview | 86.7 | 90.0 | 3/3 | 3/3 | 3/3 | 3/3 | 3/3 | 3/3 | 2/3 | 3/3 | 0/3 | 3/3 |
GLM-4.5-Air | 40.0 | 60.0 | 1/3 | 3/3 | 3/3 | 0/3 | 0/3 | 2/3 | 0/3 | 2/3 | 0/3 | 1/3 |
| K-LLM Project Round 2 | ||||||||||||
K-EXAONE-236B-A23B | 50.0 | 60.0 | 0/3 | 3/3 | 3/3 | 0/3 | 0/3 | 3/3 | 1/3 | 3/3 | 0/3 | 2/3 |
Solar-Open-100B | 36.7 | 50.0 | 0/3 | 2/3 | 3/3 | 0/3 | 0/3 | 2/3 | 0/3 | 3/3 | 0/3 | 1/3 |
| K-LLM Project Round 1 | ||||||||||||
Solar-Pro 2 | 60.0 | 70.0 | 1/3 | 3/3 | 3/3 | 2/3 | 0/3 | 3/3 | 0/3 | 3/3 | 0/3 | 3/3 |
EXAONE-4.0-32B | 26.7 | 40.0 | 0/3 | 2/3 | 3/3 | 0/3 | 0/3 | 2/3 | 0/3 | 1/3 | 0/3 | 0/3 |
| Local - KR | ||||||||||||
Kanana-2-30B-Thinking | 53.3 | 60.0 | 0/3 | 3/3 | 3/3 | 0/3 | 0/3 | 3/3 | 1/3 | 3/3 | 0/3 | 3/3 |
Kanana-2-30B-Thinking-2601 | 50.0 | 60.0 | 0/3 | 3/3 | 3/3 | 1/3 | 0/3 | 3/3 | 0/3 | 2/3 | 0/3 | 3/3 |


