Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Skier8402
's Collections
multimodal
Diffusion model tools
metrics
RAG-agents
Speech apps
Prompts
Interesting finds
Chat-agents
Datasets
LLM-transparency-tools
Data creation
Computer vision
Datasets
updated
3 days ago
Interesting datasets to help train LLMs and beyond
Upvote
-
Open-Orca/OpenOrca
Viewer
•
Updated
Oct 21, 2023
•
2.91M
•
10.7k
•
1.34k
NeelNanda/pile-10k
Viewer
•
Updated
Oct 14, 2022
•
10k
•
6.7k
•
14
legacy-datasets/mc4
Updated
Mar 5
•
14.9k
•
149
oscar-corpus/oscar
Updated
Mar 21
•
11.4k
•
178
deepset/prompt-injections
Viewer
•
Updated
Jul 30
•
662
•
1.46k
•
48
epfl-llm/guidelines
Viewer
•
Updated
Mar 7
•
38k
•
959
•
112
wanng/midjourney-v5-202304-clean
Viewer
•
Updated
May 24
•
1.7M
•
172
•
85
CohereForAI/aya_dataset
Viewer
•
Updated
Jun 28
•
206k
•
3.87k
•
281
google/fleurs
Updated
Aug 25
•
22.2k
•
254
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12
•
31.1M
•
5.42k
•
567
microsoft/orca-math-word-problems-200k
Viewer
•
Updated
Mar 4
•
200k
•
1.58k
•
417
HuggingFaceFW/fineweb
Viewer
•
Updated
Jul 16
•
46B
•
403k
•
1.76k
proj-persona/PersonaHub
Viewer
•
Updated
Oct 5
•
375k
•
4.85k
•
460
nyu-visionx/Cambrian-10M
Preview
•
Updated
Jul 8
•
14k
•
103
BAAI/Infinity-Instruct
Viewer
•
Updated
27 days ago
•
20.4M
•
8.82k
•
559
NousResearch/hermes-function-calling-v1
Viewer
•
Updated
Aug 30
•
11.6k
•
623
•
219
meta-llama/Llama-3.1-405B-Instruct
Text Generation
•
Updated
Sep 25
•
475k
•
533
OpenAssistant/oasst2
Viewer
•
Updated
Jan 11
•
135k
•
1.44k
•
215
OpenAssistant/oasst1
Viewer
•
Updated
May 2, 2023
•
88.8k
•
2.89k
•
1.27k
HuggingFaceTB/smoltalk
Viewer
•
Updated
1 day ago
•
2.2M
•
1.36k
•
156
Upvote
-
Share collection
View history
Collection guide
Browse collections