🦢SWIM-IR Dataset - a nthakur Collection

nthakur 's Collections

🦢SWIM-IR Dataset

GPL BEIR Datasets

🦢SWIM-IR Dataset

updated Apr 28

29 million Synthetic Wikipedia-based Multilingual Retrieval Training Pairs.

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

Paper • 2311.05800 • Published Nov 10, 2023 • 3
nthakur/swim-ir-cross-lingual

Viewer • Updated Apr 28 • 15.4M • 491 • 6

Note SWIM-IR (Cross-lingual) dataset, where the query is in the target language and the passage is in English.
nthakur/swim-ir-monolingual

Viewer • Updated Apr 28 • 3.17M • 262 • 6

Note SWIM-IR (Monolingual) dataset, where both the query and the passage are in the target language.
nthakur/indic-swim-ir-cross-lingual

Viewer • Updated Apr 28 • 93k • 195 • 2

Note Indic SWIM-IR (Cross-lingual) dataset, where the query is in the Indo-European language and the passage is in English.