juanluisdb's picture
Update README.md
9837751 verified
|
raw
history blame
7.02 kB
metadata
library_name: transformers
tags:
  - cross-encoder
datasets:
  - lightonai/ms-marco-en-bge
language:
  - en
base_model:
  - cross-encoder/ms-marco-MiniLM-L-6-v2

Model Card for Model ID

This model is finetuned starting from the well-known ms-marco-MiniLM-L-6-v2 using KL distillation techniques as described here, using bge-reranker-v2-m3 as teacher

Usage

Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)

Usage with SentenceTransformers

from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-reborn", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])

Evaluation

BEIR (NDCG@10)

I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.

bm25 jina-reranker-v1-turbo-en bge-reranker-v2-m3 mxbai-rerank-base-v1 ms-marco-MiniLM-L-6-v2 MiniLM-L-6-rerank-refreshed-ablated MiniLM-L-6-rerank-refreshed
nq 0.305 0.533 0.597 0.535 0.523 0.541 0.580
fever 0.638 0.852 0.857 0.767 0.801 0.822 0.867
fiqa 0.238 0.336 0.397 0.382 0.349 0.36 0.364
trec-covid 0.589 0.774 0.784 0.830 0.741 0.733 0.738
scidocs 0.15 0.166 0.169 0.171 0.164 0.163 0.165
scifact 0.676 0.739 0.731 0.719 0.688 0.738 0.750
nfcorpus 0.318 0.353 0.336 0.353 0.349 0.35 0.350
hotpotqa 0.629 0.745 0.794 0.668 0.724 0.758 0.775
dbpedia-entity 0.319 0.421 0.445 0.416 0.445 0.438 0.444
quora 0.787 0.858 0.858 0.747 0.825 0.862 0.871
climate-fever 0.163 0.233 0.314 0.253 0.244 0.245 0.309
nq* fever* fiqa trec-covid scidocs scifact nfcorpus hotpotqa dbpedia-entity quora climate-fever
bm25 0.305 0.638 0.238 0.589 0.150 0.676 0.318 0.629 0.319 0.787 0.163
jina-reranker-v1-turbo-en 0.533 0.852 0.336 0.774 0.166 0.739 0.353 0.745 0.421 0.858 0.233
bge-reranker-v2-m3 0.597 0.857 0.397 0.784 0.169 0.731 0.336 0.794 0.445 0.858 0.314
mxbai-rerank-base-v1 0.535 0.767 0.382 0.830 0.171 0.719 0.353 0.668 0.416 0.747 0.253
ms-marco-MiniLM-L-6-v2 0.523 0.801 0.349 0.741 0.164 0.688 0.349 0.724 0.445 0.825 0.244
MiniLM-L-6-rerank-reborn 0.580 0.867 0.364 0.738 0.165 0.750 0.350 0.775 0.444 0.871 0.309

* Training splits of NQ and Fever were used as part of the training data.

Comparison with ablated model trained only on MSMarco:

nq fever fiqa trec-covid scidocs scifact nfcorpus hotpotqa dbpedia-entity quora climate-fever
ms-marco-MiniLM-L-6-v2 0.5234 0.8007 0.349 0.741 0.1638 0.688 0.3493 0.7235 0.4445 0.8251 0.2438
MiniLM-L-6-rerank-refreshed-ablated 0.5412 0.8221 0.3598 0.7331 0.163 0.7376 0.3495 0.7583 0.4382 0.8619 0.2449
improvement (%) 3.40 2.67 3.08 -1.07 -0.47 7.22 0.08 4.80 -1.41 4.45 0.47

Datasets Used

~900k queries with 32-way triplets were used from these datasets:

  • MSMarco
  • TriviaQA
  • Natural Questions
  • FEVER