Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 5 days ago • 42
EXAONE-3.5 Collection EXAONE 3.5 language model series including instruction-tuned models of 2.4B, 7.8B, and 32B. • 10 items • Updated 8 days ago • 75
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published 18 days ago • 53
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 22 days ago • 45
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published 26 days ago • 55
Loss-to-Loss Prediction: Scaling Laws for All Datasets Paper • 2411.12925 • Published 28 days ago • 5
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15 • 61
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published 29 days ago • 47
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7 • 110
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22 • 126
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse Paper • 2410.21333 • Published Oct 27 • 10
Counting Ability of Large Language Models and Impact of Tokenization Paper • 2410.19730 • Published Oct 25 • 10
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper • 2410.14669 • Published Oct 18 • 36
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22 • 88
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch Paper • 2410.18693 • Published Oct 24 • 40