BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published 10 days ago • 28
view article Article BigCodeArena: Judging code generations end to end with code executions By bigcode • 13 days ago • 16
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11 • 33
NextCoder Collection NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. • 6 items • Updated Jul 9 • 71
Running on CPU Upgrade 13.6k 13.6k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots