Awesome work! Do you want to try AMO-Bench, the most challenging MO-level benchmark?
#3
by
ShengnanAn
- opened
Hi,
Thank you for releasing such a powerful reasoning model. I’m truly impressed by its capabilities, especially in mathematical reasoning—it even matches or surpasses some closed-source models like GPT-5!
We recently released AMO-Bench, a highly challenging MO-level benchmark. Currently, GPT-5 holds the state-of-the-art results on this benchmark (52.4% accuracy), with several open-source models rapidly catching up.
Would you be interested in testing Kimi-K2-Thinking on AMO-Bench? It would be exciting to see if it could become the first open-source model to outperform GPT-5 on this task!
Here is our Github Repo for evaluation. If you have any questions, feel free to contact us.
Best regards,
Shengnan An