File size: 5,340 Bytes
a418052
 
 
 
dfd64c8
 
9822383
 
dfd64c8
 
 
 
a418052
 
 
 
dfd64c8
 
 
 
 
 
 
 
 
 
ef545c8
3b40d1b
dfd64c8
 
 
 
 
 
 
 
 
 
 
 
3b40d1b
dfd64c8
 
a418052
6ea9a53
a418052
6ea9a53
dfd64c8
a418052
3b40d1b
5c9c5e5
 
 
 
 
 
 
 
 
 
 
 
a418052
dfd64c8
 
3b40d1b
5c9c5e5
3b40d1b
5c9c5e5
 
 
 
 
 
 
 
 
 
 
 
dfd64c8
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
library_name: transformers
tags:
- cross-encoder
datasets:
- lightonai/ms-marco-en-bge
- juanluisdb/triviaqa-bge-m3-logits
- juanluisdb/nq-bge-m3-logits
language:
- en
base_model:
- cross-encoder/ms-marco-MiniLM-L-6-v2
---

# Model Card for Model ID

This model is finetuned starting from the well-known [ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) using KL distillation techniques as described [here](https://www.answer.ai/posts/2024-08-13-small-but-mighty-colbert.html),
using [bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) as teacher

# Usage

## Usage with Transformers

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-m3")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)
```


## Usage with SentenceTransformers

```python
from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-m3", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
```

# Evaluation

### BEIR (NDCG@10)
I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.

|                |   bm25 |   jina-reranker-v1-turbo-en | bge-reranker-v2-m3   | mxbai-rerank-base-v1   |   ms-marco-MiniLM-L-6-v2 | MiniLM-L-6-rerank-m3   |
|:---------------|-------:|----------------------------:|:---------------------|:-----------------------|-------------------------:|:------------------------------|
| nq*             |  0.305 |                       0.533 | **0.597**            | 0.535                  |                    0.523 | 0.580                         |
| fever*         |  0.638 |                       0.852 | 0.857                | 0.767                  |                    0.801 | **0.867**                     |
| fiqa           |  0.238 |                       0.336 | **0.397**            | 0.382                  |                    0.349 | 0.364                         |
| trec-covid     |  0.589 |                       0.774 | 0.784                | **0.830**              |                    0.741 | 0.738                         |
| scidocs        |  0.15  |                       0.166 | 0.169                | **0.171**              |                    0.164 | 0.165                         |
| scifact        |  0.676 |                       0.739 | 0.731                | 0.719                  |                    0.688 | **0.750**                     |
| nfcorpus       |  0.318 |                       0.353 | 0.336                | **0.353**              |                    0.349 | 0.350                         |
| hotpotqa       |  0.629 |                       0.745 | **0.794**            | 0.668                  |                    0.724 | 0.775                         |
| dbpedia-entity |  0.319 |                       0.421 | **0.445**            | 0.416                  |                    0.445 | 0.444                         |
| quora          |  0.787 |                       0.858 | 0.858                | 0.747                  |                    0.825 | **0.871**                     |
| climate-fever  |  0.163 |                       0.233 | **0.314**            | 0.253                  |                    0.244 | 0.309                         |

\* Training splits of NQ and Fever were used as part of the training data.

Comparison with [ablated model](https://huggingface.co/juanluisdb/MiniLM-L-6-rerank-m3-ablated) trained only on MSMarco:

|                |   ms-marco-MiniLM-L-6-v2 |   MiniLM-L-6-rerank-m3-ablated |
|:---------------|-------------------------:|--------------------------------------:|
| nq             |                   0.5234 |                                **0.5412** |
| fever          |                   0.8007 |                                **0.8221** |
| fiqa           |                   0.349  |                                **0.3598** |
| trec-covid     |                   **0.741**  |                                0.7331 |
| scidocs        |                   **0.1638** |                                0.163  |
| scifact        |                   0.688  |                                **0.7376** |
| nfcorpus       |                   0.3493 |                                **0.3495** |
| hotpotqa       |                   0.7235 |                                **0.7583** |
| dbpedia-entity |                   **0.4445** |                                0.4382 |
| quora          |                   0.8251 |                                **0.8619** |
| climate-fever  |                   0.2438 |                                **0.2449** |


# Datasets Used

~900k queries with 32-way triplets were used from these datasets:

* MSMarco
* TriviaQA
* Natural Questions
* FEVER