Pretrain Data HuggingFaceTB/smollm-corpus Viewer • Updated Sep 6, 2024 • 237M • 18.5k • 409 HuggingFaceFW/fineweb-edu-classifier Text Classification • 0.1B • Updated Nov 17, 2024 • 28.7k • • 203 HuggingFaceFW/fineweb Viewer • Updated Jul 11, 2025 • 52.5B • 184k • 2.6k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 3.44k • 390
HuggingFaceFW/fineweb-edu-classifier Text Classification • 0.1B • Updated Nov 17, 2024 • 28.7k • • 203
Pretrain Data HuggingFaceTB/smollm-corpus Viewer • Updated Sep 6, 2024 • 237M • 18.5k • 409 HuggingFaceFW/fineweb-edu-classifier Text Classification • 0.1B • Updated Nov 17, 2024 • 28.7k • • 203 HuggingFaceFW/fineweb Viewer • Updated Jul 11, 2025 • 52.5B • 184k • 2.6k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 3.44k • 390
HuggingFaceFW/fineweb-edu-classifier Text Classification • 0.1B • Updated Nov 17, 2024 • 28.7k • • 203