Running 230 BigCodeBench Leaderboard 🥇 230 Explore code-generation model leaderboards and task details
Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 88 items • Updated Mar 2 • 117
view article Article ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models Jul 27, 2024 • 35
Running on CPU Upgrade 599 GAIA Leaderboard 🦾 599 Submit your model answers to GAIA benchmark and view leaderboard
Running Featured 562 Vision Arena (Testing VLMs side-by-side) 🖼 562 Explore Vision Arena’s computer‑vision tools online
Running 232 AI2 WildBench Leaderboard (V2) 🦁 232 Display and explore a leaderboard of language models