Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
pranay-j
's Collections
advanced-rag
LLM_architectures
NLP Parameter Efficient Finetuning
Text to Speech Architectures
Automatic Speech Recognition Architectures
grammarly
graident optimization
Memory efficient training
Multimodal
Instruction tuning datasets
Language Model Pretraining Dataset
reward model dataset
positional encoding Language models
Domain adaption of dense retrieval
Datasets: For training Embedding Models
audio-language-model-architecture
Datasets: For training Embedding Models
updated
Apr 28
This provides data sources for training and evaluating embedding models
Upvote
-
embedding-data/altlex
Viewer
•
Updated
Aug 2, 2022
•
113k
•
94
•
1
embedding-data/sentence-compression
Viewer
•
Updated
Aug 2, 2022
•
180k
•
126
•
20
embedding-data/QQP_triplets
Viewer
•
Updated
Aug 2, 2022
•
102k
•
375
•
7
embedding-data/PAQ_pairs
Viewer
•
Updated
Aug 2, 2022
•
7.29M
•
76
•
4
embedding-data/SPECTER
Viewer
•
Updated
Aug 2, 2022
•
684k
•
61
•
3
embedding-data/Amazon-QA
Viewer
•
Updated
Aug 2, 2022
•
1.1M
•
90
•
2
embedding-data/simple-wiki
Viewer
•
Updated
Aug 2, 2022
•
102k
•
125
•
9
embedding-data/WikiAnswers
Viewer
•
Updated
Aug 2, 2022
•
3.46M
•
66
•
6
embedding-data/coco_captions_quintets
Viewer
•
Updated
Aug 2, 2022
•
82.8k
•
62
•
6
embedding-data/flickr30k_captions_quintets
Viewer
•
Updated
Aug 2, 2022
•
31.8k
•
69
•
3
microsoft/ms_marco
Viewer
•
Updated
Jan 4
•
1.11M
•
5.61k
•
124
kyunghyuncho/search_qa
Updated
Jun 16, 2023
•
195
•
19
flax-sentence-embeddings/stackexchange_xml
Updated
Jul 26, 2021
•
126
•
1
nyu-mll/multi_nli
Viewer
•
Updated
Jan 4
•
412k
•
3.6k
•
90
stanfordnlp/snli
Viewer
•
Updated
Mar 6
•
570k
•
20.1k
•
67
mandarjoshi/trivia_qa
Viewer
•
Updated
Jan 5
•
848k
•
74.4k
•
97
google-research-datasets/natural_questions
Viewer
•
Updated
Mar 11
•
26.3k
•
7.79k
•
85
Cohere/beir-embed-english-v3
Viewer
•
Updated
Jan 3
•
50.5M
•
1.24k
•
3
Upvote
-
Share collection
View history
Collection guide
Browse collections