Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Paper • 2410.23743 • Published 25 days ago • 59
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Paper • 2410.20650 • Published 28 days ago • 16