Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
skymizer
's Collections
Instruction Tuning Datasets
Pre-Training Datasets
Domain-Specific Datasets
Alignment Algorithm Papers
Other Awesome datasets
Pre-Training Datasets
updated
2 days ago
Upvote
-
allenai/c4
Viewer
•
Updated
Jan 9
•
10.4B
•
455k
•
310
allenai/dolma
Updated
Apr 17
•
951
•
843
togethercomputer/RedPajama-Data-1T
Viewer
•
Updated
Jun 17
•
1.73M
•
1.03k
•
1.06k
tiiuae/falcon-refinedweb
Viewer
•
Updated
Jun 20, 2023
•
968M
•
26k
•
811
HuggingFaceFW/fineweb
Viewer
•
Updated
Jul 16
•
46B
•
383k
•
1.74k
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
Oct 11
•
3B
•
596k
•
533
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
Sep 6
•
237M
•
18.8k
•
240
Zyphra/dclm-dedup
Viewer
•
Updated
19 days ago
•
615M
•
1.27k
•
10
Upvote
-
Share collection
View history
Collection guide
Browse collections