Open LLM Leaderboard

Team

community

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

Activity Feed

AI & ML interests

Evaluating open LLMs

Recent Activity

AdinaY authored a paper 28 days ago

RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

thomwolf authored a paper about 2 months ago

Robot Learning: A Tutorial

lvwerra authored a paper about 2 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

View all activity

AdinaY

posted an update 27 days ago

Post

3157

Kimi K2 Thinking is now live on the hub 🔥

moonshotai/Kimi-K2-Thinking

✨ 1T MoE for deep reasoning & tool use
✨ Native INT4 quantization = 2× faster inference
✨ 256K context window
✨ Modified MIT license

AdinaY

posted an update 28 days ago

Post

610

Chinese open source AI in October wasn’t about bigger models, it was about real world impact 🔥

https://huggingface.co/collections/zh-ai-community/october-2025-china-open-source-highlights

✨ Vision-Language & OCR wave 🌊
- DeepSeek-OCR : 3B
- PaddleOCR-VL : 0.9B
- Qwen3-VL : 2B / 4B / 8B / 32B /30B-A3B
- Open-Bee: Bee-8B-RL
- http://Z.ai Glyph :10B

OCR is industrializing, the real game now is understanding the (long context) document, not just reading it.

✨ Text generation: scale or innovation?
- MiniMax-M2: 229B
- Antgroup Ling-1T & Ring-1T
- Moonshot Kimi-Linear : linear-attention challenger
- Kwaipilot KAT-Dev

Efficiency is the key.

✨ Any-to-Any & World-Model : one step forward to the real world
- BAAI Emu 3.5
- Antgroup Ming-flash-omni
- HunyuanWorld-Mirror: 3D

Aligning with the “world model” globally

✨ Audio & Speech + Video & Visual: released from entertainment labs to delivery platforms
- SoulX-Podcast TTS
- LongCat-Audio-Codec & LongCat-Video by Meituan delivery paltform
- xiabs DreamOmni 2

Looking forward to what's next 🚀

AdinaY

authored a paper 28 days ago

RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

Paper • 2510.17950 • Published Oct 20 • 7

AdinaY

posted an update about 1 month ago

Post

497

Kimi Linear🚀 Hybrid linear attention model from Moonshot AI

https://huggingface.co/collections/moonshotai/kimi-linear-a3b

✨ 48B total/ 3B active - MIT license
✨ Up to 1M context
✨ 84.3 on RULER (128k) with 3.98× speedup
✨ Hybrid KDA + MLA architecture for peak throughput & quality

meg

posted an update about 1 month ago

Post

3707

🤖 Did you know your voice might be cloned without your consent from just *one sentence* of audio?
That's not great. So with @frimelle , we brainstormed a new idea for developers who want to curb malicious use: ✨The Voice Consent Gate.✨
Details, code, here: https://huggingface.co/blog/voice-consent-gate

3 replies

AdinaY

posted an update about 1 month ago

Post

1746

Ming-flash-omni Preview 🚀 Multimodal foundation model from AntGroup

inclusionAI/Ming-flash-omni-Preview

✨ Built on Ling-Flash-2.0: 10B total/6B active
✨ Generative segmentation-as-editing
✨ SOTA contextual & dialect ASR
✨ High-fidelity image generation

AdinaY

posted an update about 1 month ago

Post

1850

Glyph 🔥 a framework that scales context length by compressing text into images and processing them with vision–language models, released by Z.ai.

Paper:https://huggingface.co/papers/2510.17800
Model:https://huggingface.co/zai-org/Glyph

✨ Compresses long sequences visually to bypass token limits
✨ Reduces computational and memory costs
✨ Preserves meaning through multimodal encoding
✨ Built on GLM-4.1V-9B-Base

AdinaY

posted an update about 1 month ago

Post

2645

HunyuanWorld Mirror🔥a versatile feed forward model for universal 3D world reconstruction by Tencent

tencent/HunyuanWorld-Mirror

✨ Any prior in → 3D world out
✨ Mix camera, intrinsics, depth as priors
✨ Predict point clouds, normals, Gaussians & more in one pass
✨ Unified architecture for all 3D task

AdinaY

posted an update about 2 months ago

Post

680

PaddleOCR VL🔥 0.9B Multilingual VLM by Baidu

PaddlePaddle/PaddleOCR-VL

✨ Ultra-efficient NaViT + ERNIE-4.5 architecture
✨ Supports 109 languages 🤯
✨ Accurately recognizes text, tables, formulas & charts
✨ Fast inference and lightweight for deployment

AdinaY

posted an update about 2 months ago

Post

1810

Bee-8B 🐝 open 8B Multimodal LLM built on high quality data, released by
TencentHunyuan

Paper: Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs (2510.13795)
Model: https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995

✨ Trained on Honey-Data-15M, a 15M-sample SFT corpus with dual-level CoT reasoning
✨ Backed by HoneyPipe, a transparent & reproducible open data curation suite

AdinaY

posted an update about 2 months ago

Post

335

Reflection ≠ self-correction

Interesting paper on long-chain reasoning 📑 from Miromind AI.
First Try Matters: Revisiting the Role of Reflection in Reasoning Models (2510.08308)

It dives into how LLMs think 🧠 Most reflections confirm, not fix, and true improvement comes from stronger initial reasoning.

thomwolf

authored a paper about 2 months ago

Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14 • 113

lvwerra

authored a paper about 2 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9 • 35

AdinaY

posted an update about 2 months ago

Post

497

Ring-1T🔥 the trillion-parameter thinking model released by Ant group, the company behind Alipay

inclusionAI/Ring-1T

✨ 1T params (50B active)- MIT license
✨ 128K context (YaRN)
✨ RLVR, Icepop, and ASystem make trillion-scale RL stable

AdinaY

posted an update about 2 months ago

Post

516

KAT-Dev-72B-Exp🔥 Kuaishou's ( the company behind Kring AI ) new open model for software engineering

Kwaipilot/KAT-Dev-72B-Exp

✨ 72B - Apache2.0
✨ Redesigned attention kernel & training engine for efficient context-aware RL
✨ 74.6% accuracy on SWE-Bench Verified

clefourrier

in open-llm-leaderboard/open_llm_leaderboard about 2 months ago

How about increasing the model size of the list, for example, with the deepseek series?

#1141 opened 8 months ago by

Nurburgring

Aligning MMLU questions across models

#1142 opened 8 months ago by

jerome-white

How to get the responses from every model?

#1144 opened 7 months ago by

dakexiaoying

Is it ok to base my own leaderboard on your new frontend code?

#1146 opened 5 months ago by

gardari

AdinaY

posted an update about 2 months ago

Post

4420

At the close of the National Holiday🇨🇳, Antgroup drops a new SoTA model.

Ling-1T 🔥 the trillion-parameter flagship of the Ling 2.0 series.

inclusionAI/Ling-1T

✨1T total / 50B active params per token
✨20T+ reasoning-dense tokens (Evo-CoT)
✨128K context via YaRN
✨FP8 training: 15%+ faster, same precision as BF16
✨Hybrid Syntax-Function-Aesthetics reward for front-end & visual generation

1 reply

AI & ML interests

Recent Activity

Team members 18