|
--- |
|
language: |
|
- en |
|
--- |
|
[sentence-transformers/LaBSE](https://huggingface.co/sentence-transformers/LaBSE) pre-trained on an instructional question-and-answer dataset. Evaluated on **Precision at K** metrics and **Mean reciprocal rank**. |
|
Precision at K is a simple metric to understand and implement, but it has an important disadvantage - it does not take into account the order of elements in the "top". So, if we guessed only one item out of ten, it doesn't matter whether it was on the first or the last place - inline_formula in any case. It is obvious that the first variant is much better. |
|
ean reciprocal rank equal to the reverse rank of the first correctly guessed item. Mean reciprocal rank varies in the range [0,1] and takes into account the position of items. Unfortunately, it does this only for one item - the 1st correctly predicted item, ignoring all subsequent items. |
|
|
|
Evaluation results: |
|
```python |
|
p@1: 52 % |
|
p@3: 66 % |
|
p@5: 73 % |
|
p@10: 79 % |
|
p@15: 82 % |
|
MRR: 62 % |
|
``` |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModel |
|
tokenizer = AutoTokenizer.from_pretrained("zjkarina/LaBSE-instructDialogs") |
|
model = AutoModel.from_pretrained("zjkarina/LaBSE-instructDialogs") |
|
sentences = ["List 5 reasons why someone should learn to code", "Describe the sound of the wind on a sunny day."] |
|
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=64, return_tensors='pt') |
|
with torch.no_grad(): |
|
model_output = model(**encoded_input) |
|
embeddings = model_output.pooler_output |
|
embeddings = torch.nn.functional.normalize(embeddings) |
|
print(embeddings) |
|
``` |