metadata
library_name: transformers
tags:
- cross-encoder
datasets:
- lightonai/ms-marco-en-bge
language:
- en
base_model:
- cross-encoder/ms-marco-MiniLM-L-6-v2
Model Card for Model ID
This model is finetuned starting from the well-known ms-marco-MiniLM-L-6-v2 using KL distillation techniques as described here, using bge-reranker-v2-m3 as teacher
Usage
Usage with Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'], padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
scores = model(**features).logits
print(scores)
Usage with SentenceTransformers
from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-reborn", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
Evaluation
BEIR (NDCG@10)
I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.
bm25 | jina-reranker-v1-turbo-en | bge-reranker-v2-m3 | mxbai-rerank-base-v1 | ms-marco-MiniLM-L-6-v2 | MiniLM-L-6-rerank-refreshed-ablated | MiniLM-L-6-rerank-refreshed | |
---|---|---|---|---|---|---|---|
nq | 0.305 | 0.533 | 0.597 | 0.535 | 0.523 | 0.541 | 0.580 |
fever | 0.638 | 0.852 | 0.857 | 0.767 | 0.801 | 0.822 | 0.867 |
fiqa | 0.238 | 0.336 | 0.397 | 0.382 | 0.349 | 0.36 | 0.364 |
trec-covid | 0.589 | 0.774 | 0.784 | 0.830 | 0.741 | 0.733 | 0.738 |
scidocs | 0.15 | 0.166 | 0.169 | 0.171 | 0.164 | 0.163 | 0.165 |
scifact | 0.676 | 0.739 | 0.731 | 0.719 | 0.688 | 0.738 | 0.750 |
nfcorpus | 0.318 | 0.353 | 0.336 | 0.353 | 0.349 | 0.35 | 0.350 |
hotpotqa | 0.629 | 0.745 | 0.794 | 0.668 | 0.724 | 0.758 | 0.775 |
dbpedia-entity | 0.319 | 0.421 | 0.445 | 0.416 | 0.445 | 0.438 | 0.444 |
quora | 0.787 | 0.858 | 0.858 | 0.747 | 0.825 | 0.862 | 0.871 |
climate-fever | 0.163 | 0.233 | 0.314 | 0.253 | 0.244 | 0.245 | 0.309 |
nq* | fever* | fiqa | trec-covid | scidocs | scifact | nfcorpus | hotpotqa | dbpedia-entity | quora | climate-fever | |
---|---|---|---|---|---|---|---|---|---|---|---|
bm25 | 0.305 | 0.638 | 0.238 | 0.589 | 0.150 | 0.676 | 0.318 | 0.629 | 0.319 | 0.787 | 0.163 |
jina-reranker-v1-turbo-en | 0.533 | 0.852 | 0.336 | 0.774 | 0.166 | 0.739 | 0.353 | 0.745 | 0.421 | 0.858 | 0.233 |
bge-reranker-v2-m3 | 0.597 | 0.857 | 0.397 | 0.784 | 0.169 | 0.731 | 0.336 | 0.794 | 0.445 | 0.858 | 0.314 |
mxbai-rerank-base-v1 | 0.535 | 0.767 | 0.382 | 0.830 | 0.171 | 0.719 | 0.353 | 0.668 | 0.416 | 0.747 | 0.253 |
ms-marco-MiniLM-L-6-v2 | 0.523 | 0.801 | 0.349 | 0.741 | 0.164 | 0.688 | 0.349 | 0.724 | 0.445 | 0.825 | 0.244 |
MiniLM-L-6-rerank-reborn | 0.580 | 0.867 | 0.364 | 0.738 | 0.165 | 0.750 | 0.350 | 0.775 | 0.444 | 0.871 | 0.309 |
* Training splits of NQ and Fever were used as part of the training data.
Comparison with ablated model trained only on MSMarco:
nq | fever | fiqa | trec-covid | scidocs | scifact | nfcorpus | hotpotqa | dbpedia-entity | quora | climate-fever | |
---|---|---|---|---|---|---|---|---|---|---|---|
ms-marco-MiniLM-L-6-v2 | 0.5234 | 0.8007 | 0.349 | 0.741 | 0.1638 | 0.688 | 0.3493 | 0.7235 | 0.4445 | 0.8251 | 0.2438 |
MiniLM-L-6-rerank-refreshed-ablated | 0.5412 | 0.8221 | 0.3598 | 0.7331 | 0.163 | 0.7376 | 0.3495 | 0.7583 | 0.4382 | 0.8619 | 0.2449 |
improvement (%) | 3.40 | 2.67 | 3.08 | -1.07 | -0.47 | 7.22 | 0.08 | 4.80 | -1.41 | 4.45 | 0.47 |
Datasets Used
~900k queries with 32-way triplets were used from these datasets:
- MSMarco
- TriviaQA
- Natural Questions
- FEVER