Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.16191

LM Capabilities and Scaling

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15 • 27
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9 • 21
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 35
Are large language models superhuman chemists?

Paper • 2404.01475 • Published Apr 1 • 16

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 183
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18 • 24
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9 • 34

CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

Paper • 2401.01275 • Published Jan 2 • 1
Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18 • 10
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 116
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Paper • 2406.12624 • Published Jun 18 • 36

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs