Awesome work! Do you want to try AMO-Bench, the most challenging MO-level benchmark?

#3
by ShengnanAn - opened

Hi,

Thank you for releasing such a powerful reasoning model. I’m truly impressed by its capabilities, especially in mathematical reasoning—it even matches or surpasses some closed-source models like GPT-5!

We recently released AMO-Bench, a highly challenging MO-level benchmark. Currently, GPT-5 holds the state-of-the-art results on this benchmark (52.4% accuracy), with several open-source models rapidly catching up.

Would you be interested in testing Kimi-K2-Thinking on AMO-Bench? It would be exciting to see if it could become the first open-source model to outperform GPT-5 on this task!

Here is our Github Repo for evaluation. If you have any questions, feel free to contact us.

Best regards,
Shengnan An

Sign up or log in to comment