BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks Jun 18 • 41
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Paper • 2407.10956 • Published Jul 15 • 6
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published 10 days ago • 42
OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. • 9 items • Updated 4 days ago • 70
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 14 days ago • 108
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published 17 days ago • 32
🫐 ProX Projects Collection Collection for: "Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale" • 18 items • Updated 29 days ago • 2
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Paper • 2410.07137 • Published Oct 9 • 7
ProX Dataset Collection a collection of pre-training corpora refined by ProX • 5 items • Updated Oct 18 • 5
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25 • 59
MagpieLM Collection Aligning LMs with Fully Open Recipe (data+training configs+logs) • 9 items • Updated Sep 22 • 15
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild Paper • 2409.03753 • Published Sep 5 • 18
OLMoE Collection Artifacts for open mixture-of-experts language models. • 13 items • Updated 7 days ago • 25
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models Paper • 2409.00509 • Published Aug 31 • 38
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 86
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51