š„ GRM2 - The small one that surpasses the big ones. What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can. GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks. š¤ Model: OrionLLM/GRM2-3b The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.