|
--- |
|
language: |
|
- de |
|
tags: |
|
- cross-encoder |
|
widget: |
|
- text: "Was sind Lamas. Das Lama (Lama glama) ist eine Art der Kamele. Es ist in den südamerikanischen Anden verbreitet und eine vom Guanako abstammende Haustierform." |
|
example_title: "Example Query / Paragraph" |
|
license: apache-2.0 |
|
metrics: |
|
- Rouge-Score |
|
--- |
|
# cross-encoder-mmarco-german-distilbert-base |
|
|
|
## Model description: |
|
This model is a fine-tuned [cross-encoder](https://www.sbert.net/examples/training/cross-encoder/README.html) on the [MMARCO dataset](https://huggingface.co/datasets/unicamp-dl/mmarco) which is the machine translated version of the MS MARCO dataset. |
|
As base model for the fine-tuning we use [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased) |
|
|
|
Model input samples are tuples of the following format, either |
|
`<query, positive_paragraph>` assigned to 1 or `<query, negative_paragraph>` assigned to 0. |
|
|
|
The model was trained for 1 epoch. |
|
|
|
## Model usage |
|
The cross-encoder model can be used like this: |
|
|
|
``` |
|
from sentence_transformers import CrossEncoder |
|
model = CrossEncoder('model_name') |
|
scores = model.predict([('Query 1', 'Paragraph 1'), ('Query 2', 'Paragraph 2')]) |
|
``` |
|
|
|
The model will predict scores for the pairs `('Query 1', 'Paragraph 1')` and `('Query 2', 'Paragraph 2')`. |
|
|
|
For more details on the usage of the cross-encoder models have a look into the [Sentence-Transformers](https://www.sbert.net/) |
|
|
|
## Model Performance: |
|
Model evaluation was done on 2000 evaluation paragraphs of the dataset. |
|
|
|
| Accuracy | F1-Score | Precision | Recall | |
|
| --- | --- | --- | --- | |
|
| 89.70 | 86.82 | 86.82 | 93.50 | |