Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Skier8402
's Collections
Guides
Interpretability tools
translation
OCR
biomedical
Browser-agents
Realtime-apps
Leaderboards
Quantization tools
3Dmodels
Reasoning-models
Embedding models
Swahili models
multimodal
Diffusion model tools
metrics
RAG-agents
Speech apps
Prompts
Interesting finds
Chat-agents
Datasets
LLM-transparency-tools
Data creation
Computer vision
Datasets
updated
Dec 12, 2025
Interesting datasets to help train LLMs and beyond
Upvote
-
Open-Orca/OpenOrca
Viewer
•
Updated
Feb 19, 2025
•
2.94M
•
10.7k
•
1.48k
NeelNanda/pile-10k
Viewer
•
Updated
Oct 14, 2022
•
10k
•
14.8k
•
27
legacy-datasets/mc4
Updated
Mar 5, 2024
•
2.05k
•
154
oscar-corpus/oscar
Updated
Sep 4, 2025
•
684
•
203
deepset/prompt-injections
Viewer
•
Updated
Jul 30, 2024
•
662
•
2.4k
•
95
epfl-llm/guidelines
Viewer
•
Updated
Mar 7, 2024
•
38k
•
1.28k
•
141
wanng/midjourney-v5-202304-clean
Viewer
•
Updated
May 24, 2024
•
1.7M
•
98
•
89
CohereLabs/aya_dataset
Viewer
•
Updated
Apr 15, 2025
•
206k
•
3.47k
•
331
google/fleurs
Updated
Aug 25, 2024
•
37.4k
•
364
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12, 2024
•
31.1M
•
43k
•
653
microsoft/orca-math-word-problems-200k
Viewer
•
Updated
Mar 4, 2024
•
200k
•
8.48k
•
466
HuggingFaceFW/fineweb
Viewer
•
Updated
Jul 11, 2025
•
52.5B
•
184k
•
2.6k
proj-persona/PersonaHub
Viewer
•
Updated
Sep 26, 2025
•
375k
•
14.1k
•
702
nyu-visionx/Cambrian-10M
Preview
•
Updated
Jul 8, 2024
•
11.6k
•
123
BAAI/Infinity-Instruct
Viewer
•
Updated
Dec 4, 2025
•
21.9M
•
13.8k
•
690
NousResearch/hermes-function-calling-v1
Viewer
•
Updated
10 days ago
•
11.6k
•
1.71k
•
370
meta-llama/Llama-3.1-405B-Instruct
Text Generation
•
406B
•
Updated
Sep 25, 2024
•
134k
•
•
587
OpenAssistant/oasst2
Viewer
•
Updated
Jan 11, 2024
•
135k
•
1.79k
•
279
OpenAssistant/oasst1
Viewer
•
Updated
May 2, 2023
•
88.8k
•
8.59k
•
1.47k
HuggingFaceTB/smoltalk
Viewer
•
Updated
Feb 10, 2025
•
2.2M
•
5.85k
•
386
NovaSky-AI/Sky-T1_data_17k
Viewer
•
Updated
Jan 14, 2025
•
16.4k
•
218
•
187
QuixiAI/dolphin-r1
Viewer
•
Updated
Jan 30, 2025
•
814k
•
654
•
294
HuggingFaceFW/fineweb-2
Viewer
•
Updated
Oct 27, 2025
•
4.48B
•
71.7k
•
711
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
Jul 11, 2025
•
3.5B
•
326k
•
901
open-thoughts/OpenThoughts-114k
Viewer
•
Updated
Aug 31, 2025
•
228k
•
100k
•
786
open-r1/OpenR1-Math-220k
Viewer
•
Updated
Feb 18, 2025
•
450k
•
12.5k
•
693
lelapa/Inkuba-Mono
Viewer
•
Updated
Sep 5, 2024
•
68.8M
•
34
•
14
lelapa/Inkuba-instruct
Viewer
•
Updated
Sep 5, 2024
•
212M
•
165
•
9
intronhealth/afrimedqa_v2
Viewer
•
Updated
Jun 17, 2025
•
15.3k
•
63
•
10
intronhealth/afrispeech-dialog
Preview
•
Updated
Oct 28, 2024
•
192
•
4
intronhealth/afrispeech-200
Updated
Nov 20, 2023
•
2.76k
•
31
arcinstitute/opengenome2
Preview
•
Updated
Sep 20, 2025
•
4.47k
•
113
facebook/natural_reasoning
Viewer
•
Updated
Feb 21, 2025
•
1.15M
•
1.4k
•
547
Jofthomas/hermes-function-calling-thinking-V1
Viewer
•
Updated
Feb 16, 2025
•
3.57k
•
499
•
72
CohereLabs/Global-MMLU
Viewer
•
Updated
Aug 14, 2025
•
602k
•
8.87k
•
144
FreedomIntelligence/medical-o1-reasoning-SFT
Viewer
•
Updated
Apr 22, 2025
•
90.1k
•
5.48k
•
1.04k
glaiveai/glaive-function-calling-v2
Viewer
•
Updated
Sep 27, 2023
•
113k
•
2.28k
•
480
nvidia/OpenMathReasoning
Viewer
•
Updated
May 27, 2025
•
5.68M
•
13.6k
•
391
Running
623
Sheets
🗂
623
Create and enrich datasets with AI
facebook/omnilingual-asr-corpus
Viewer
•
Updated
Nov 14, 2025
•
548k
•
7.79k
•
185
nvidia/ToolScale
Viewer
•
Updated
27 days ago
•
4.06k
•
975
•
174
Nadhari/Swahili-Thinking
Viewer
•
Updated
Nov 23, 2025
•
166
•
95
•
9
OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B
Viewer
•
Updated
Dec 12, 2025
•
200k
•
3.33k
•
241
Upvote
-
Share collection
View history
Collection guide
Browse collections