Running 55 55 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 Evaluate multilingual models using FineTasks
Running 99 99 TxT360: Trillion Extracted Text 📖 Explore a large, deduplicated dataset for LLM training
Running 602 602 FineWeb: decanting the web for the finest text data at scale 🍷 Generate high-quality web text data for LLM training
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 24
Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures Paper • 2402.05424 • Published Feb 8, 2024 • 16
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Paper • 2401.16380 • Published Jan 29, 2024 • 49
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4, 2024 • 37
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 259