Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ The model intended to be used for encoding sentences or short paragraphs. Given
|
|
18 |
|
19 |
# Training data
|
20 |
|
21 |
-
The model was trained on a random collection of **English** sentences from Wikipedia. The *full-shot* training file is available [here]
|
22 |
Low-shot training data consists of data splits of different sizes (from 10% to 0.0064%) of the [SimCSE](https://github.com/princeton-nlp/SimCSE) training corpus. Each split size comprises 5 files, created with a different seed indicated with filename postfix.
|
23 |
Data can be downloaded [here](https://huggingface.co/datasets/sap-ai-research/datasets-for-micse).
|
24 |
|
|
|
18 |
|
19 |
# Training data
|
20 |
|
21 |
+
The model was trained on a random collection of **English** sentences from Wikipedia. The *full-shot* training file is available [here](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt).
|
22 |
Low-shot training data consists of data splits of different sizes (from 10% to 0.0064%) of the [SimCSE](https://github.com/princeton-nlp/SimCSE) training corpus. Each split size comprises 5 files, created with a different seed indicated with filename postfix.
|
23 |
Data can be downloaded [here](https://huggingface.co/datasets/sap-ai-research/datasets-for-micse).
|
24 |
|