sap-ai-research
/

miCSE

Sentence Similarity

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

TJKlein commited on Aug 16, 2023

Commit

6b91936

·

1 Parent(s): 445e83e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ The model intended to be used for encoding sentences or short paragraphs. Given
 # Training data
-The model was trained on a random collection of **English** sentences from Wikipedia. The *full-shot* training file is available [here].(https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt)
 Low-shot training data consists of data splits of different sizes (from 10% to 0.0064%) of the [SimCSE](https://github.com/princeton-nlp/SimCSE) training corpus. Each split size comprises 5 files, created with a different seed indicated with filename postfix.
 Data can be downloaded [here](https://huggingface.co/datasets/sap-ai-research/datasets-for-micse).

 # Training data
+The model was trained on a random collection of **English** sentences from Wikipedia. The *full-shot* training file is available [here](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt).
 Low-shot training data consists of data splits of different sizes (from 10% to 0.0064%) of the [SimCSE](https://github.com/princeton-nlp/SimCSE) training corpus. Each split size comprises 5 files, created with a different seed indicated with filename postfix.
 Data can be downloaded [here](https://huggingface.co/datasets/sap-ai-research/datasets-for-micse).