Sentence Similarity
sentence-transformers
PyTorch
Safetensors
Transformers
Dutch
roberta
feature-extraction
text-embeddings-inference
Instructions to use NetherlandsForensicInstitute/robbert-2022-dutch-sentence-transformers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use NetherlandsForensicInstitute/robbert-2022-dutch-sentence-transformers with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("NetherlandsForensicInstitute/robbert-2022-dutch-sentence-transformers") sentences = [ "Deze week ga ik naar de kapper", "Ik ga binnenkort mijn haren laten knippen", "Morgen wil ik uitslapen", "Gisteren ging ik naar de bioscoop" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use NetherlandsForensicInstitute/robbert-2022-dutch-sentence-transformers with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("NetherlandsForensicInstitute/robbert-2022-dutch-sentence-transformers") model = AutoModel.from_pretrained("NetherlandsForensicInstitute/robbert-2022-dutch-sentence-transformers") - Inference
- Notebooks
- Google Colab
- Kaggle
Commit ·
18fdb45
1
Parent(s): 05894b8
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,6 +14,10 @@ This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentence
|
|
| 14 |
|
| 15 |
<!--- Describe your model here -->
|
| 16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
## Usage (Sentence-Transformers)
|
| 18 |
|
| 19 |
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
|
|
@@ -87,7 +91,7 @@ The model was trained with the parameters:
|
|
| 87 |
|
| 88 |
`MultiDatasetDataLoader.MultiDatasetDataLoader` of length 414262 with parameters:
|
| 89 |
```
|
| 90 |
-
{'batch_size':
|
| 91 |
```
|
| 92 |
|
| 93 |
**Loss**:
|
|
|
|
| 14 |
|
| 15 |
<!--- Describe your model here -->
|
| 16 |
|
| 17 |
+
This model is based on [KU Leuven's RobBERT model](https://huggingface.co/DTAI-KULeuven/robbert-2022-dutch-base).
|
| 18 |
+
It has been finetuned on the [Paraphrase dataset](https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/datasets/paraphrases/), which we (machine-) translated to Dutch. The Paraphrase dataset consists of multiple datasets that consist of duo's of similar texts, for example duplicate questions on a forum.
|
| 19 |
+
We have published the translated data that we used to train this model. You can find it [here](link). TODO: insert link!
|
| 20 |
+
|
| 21 |
## Usage (Sentence-Transformers)
|
| 22 |
|
| 23 |
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
|
|
|
|
| 91 |
|
| 92 |
`MultiDatasetDataLoader.MultiDatasetDataLoader` of length 414262 with parameters:
|
| 93 |
```
|
| 94 |
+
{'batch_size': 1}
|
| 95 |
```
|
| 96 |
|
| 97 |
**Loss**:
|