Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling Paper • 2410.07145 • Published Oct 9 • 2 • 3
Round and Round We Go! What makes Rotary Positional Encodings useful? Paper • 2410.06205 • Published Oct 8 • 1
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1 • 28 • 6
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8 • 107 • 7
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1 • 28 • 6
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27 • 36 • 4
KTO: Model Alignment as Prospect Theoretic Optimization Paper • 2402.01306 • Published Feb 2 • 15 • 2
Planning In Natural Language Improves LLM Search For Code Generation Paper • 2409.03733 • Published Sep 5 • 1
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 53 • 4
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16 • 4 • 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 45 • 10
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs Paper • 2405.20314 • Published May 30 • 1
Contextual Position Encoding: Learning to Count What's Important Paper • 2405.18719 • Published May 29 • 5 • 1