view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7 • 7
argilla/ultrafeedback-binarized-preferences-cleaned Viewer • Updated Dec 11, 2023 • 60.9k • 8.54k • 125
skymizer/pretraining-50B-llama3.2-tokenized-padded-packed-2048 Viewer • Updated 9 days ago • 22.5M • 19
skymizer/pretraining-50B-llama3.2-tokenized-padded-packed-2048 Viewer • Updated 9 days ago • 22.5M • 19