view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7 • 6
📚 FineWeb-Edu Collection FineWeb-Edu datasets, classifier and ablation model • 5 items • Updated Jun 12 • 11