ResearchMath-14K: Scaling Research-Level Mathematics via Agents Paper • 2605.28003 • Published 1 day ago • 33
Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models Paper • 2605.27311 • Published 2 days ago • 3
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 19 days ago • 79
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data Paper • 2510.10159 • Published Oct 11, 2025 • 3
Measuring what Matters: Construct Validity in Large Language Model Benchmarks Paper • 2511.04703 • Published Nov 3, 2025 • 8
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math Paper • 2602.06291 • Published Feb 6 • 24
Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem Paper • 2512.03073 • Published Nov 27, 2025 • 7
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15, 2025 • 9
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published Sep 24, 2025 • 50
Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings Paper • 2509.14405 • Published Sep 17, 2025 • 2
Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans Paper • 2506.22439 • Published May 29, 2025 • 3
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17, 2025 • 20
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America Paper • 2507.00999 • Published Jul 1, 2025 • 1
view post Post 8945 We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago! See translation 6 replies · 🚀 20 20 👍 10 10 🔥 6 6 + Reply