metadata

library_name: transformers
tags:
  - cross-encoder
datasets:
  - lightonai/ms-marco-en-bge
language:
  - en
base_model:
  - cross-encoder/ms-marco-MiniLM-L-6-v2

Model Card for Model ID

This model is finetuned starting from the well-known ms-marco-MiniLM-L-6-v2 using KL distillation techniques as described here, using bge-reranker-v2-m3 as teacher

Usage

Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
tokenizer = AutoTokenizer.from_pretrained("juanluisdb/MiniLM-L-6-rerank-reborn")
features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)

Usage with SentenceTransformers

from sentence_transformers import CrossEncoder
model = CrossEncoder("juanluisdb/MiniLM-L-6-rerank-reborn", max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])

Evaluation

BEIR (NDCG@10)

I've run tests on different BEIR datasets. Cross Encoders rerank top100 BM25 results.

	bm25	jina-reranker-v1-turbo-en	bge-reranker-v2-m3	mxbai-rerank-base-v1	ms-marco-MiniLM-L-6-v2	MiniLM-L-6-rerank-refreshed-ablated	MiniLM-L-6-rerank-refreshed
nq	0.305	0.533	0.597	0.535	0.523	0.541	0.580
fever	0.638	0.852	0.857	0.767	0.801	0.822	0.867
fiqa	0.238	0.336	0.397	0.382	0.349	0.36	0.364
trec-covid	0.589	0.774	0.784	0.830	0.741	0.733	0.738
scidocs	0.15	0.166	0.169	0.171	0.164	0.163	0.165
scifact	0.676	0.739	0.731	0.719	0.688	0.738	0.750
nfcorpus	0.318	0.353	0.336	0.353	0.349	0.35	0.350
hotpotqa	0.629	0.745	0.794	0.668	0.724	0.758	0.775
dbpedia-entity	0.319	0.421	0.445	0.416	0.445	0.438	0.444
quora	0.787	0.858	0.858	0.747	0.825	0.862	0.871
climate-fever	0.163	0.233	0.314	0.253	0.244	0.245	0.309

	nq*	fever*	fiqa	trec-covid	scidocs	scifact	nfcorpus	hotpotqa	dbpedia-entity	quora	climate-fever
bm25	0.305	0.638	0.238	0.589	0.150	0.676	0.318	0.629	0.319	0.787	0.163
jina-reranker-v1-turbo-en	0.533	0.852	0.336	0.774	0.166	0.739	0.353	0.745	0.421	0.858	0.233
bge-reranker-v2-m3	0.597	0.857	0.397	0.784	0.169	0.731	0.336	0.794	0.445	0.858	0.314
mxbai-rerank-base-v1	0.535	0.767	0.382	0.830	0.171	0.719	0.353	0.668	0.416	0.747	0.253
ms-marco-MiniLM-L-6-v2	0.523	0.801	0.349	0.741	0.164	0.688	0.349	0.724	0.445	0.825	0.244
MiniLM-L-6-rerank-reborn	0.580	0.867	0.364	0.738	0.165	0.750	0.350	0.775	0.444	0.871	0.309

* Training splits of NQ and Fever were used as part of the training data.

Comparison with ablated model trained only on MSMarco:

	nq	fever	fiqa	trec-covid	scidocs	scifact	nfcorpus	hotpotqa	dbpedia-entity	quora	climate-fever
ms-marco-MiniLM-L-6-v2	0.5234	0.8007	0.349	0.741	0.1638	0.688	0.3493	0.7235	0.4445	0.8251	0.2438
MiniLM-L-6-rerank-refreshed-ablated	0.5412	0.8221	0.3598	0.7331	0.163	0.7376	0.3495	0.7583	0.4382	0.8619	0.2449
improvement (%)	3.40	2.67	3.08	-1.07	-0.47	7.22	0.08	4.80	-1.41	4.45	0.47

Datasets Used

~900k queries with 32-way triplets were used from these datasets:

MSMarco
TriviaQA
Natural Questions
FEVER