E5 Large V2 with 1024 tokens context window

This model uses positional embeddings tweak, which allows it to has a context window twice as it's base model (intfloat/e5-large-v2). This is a training-less context improvement, meaning that no fine-tuning was applied to any of the base model parameters. The two models differ only in their positional embeddings. Paper describing applied tweak is coming soon.

Evaluation Results

The model was evaluated on four retrieval tasks from LongEmbed benchmark and achieved following NDCG@10 metrics (arb. units):

  1. LEMBSummScreenFDRetrieval: 0.8617 (8.80 pp gain over intfloat/e5-large-v2),
  2. LEMBQMSumRetrieval: 0.3112 (6.04 pp gain over intfloat/e5-large-v2),
  3. LEMBWikimQARetrieval: 0.6570 (7.27 pp gain over intfloat/e5-large-v2),
  4. LEMBNarrativeQARetrieval: 0.2792 (1.55 pp gain over intfloat/e5-large-v2)

Steps to reproduce:

from sentence_transformers import SentenceTransformer  # 4.0.2
import mteb  # 1.38.2


# load model
model = SentenceTransformer('idanylenko/e5-large-v2-ctx1024')

# define tasks
retrieval_task_list = [
    "LEMBSummScreenFDRetrieval",
    "LEMBQMSumRetrieval",
    "LEMBWikimQARetrieval",
    "LEMBNarrativeQARetrieval"
]
tasks = mteb.get_tasks(tasks=retrieval_task_list)

# run the evaluation
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model)
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for idanylenko/e5-large-v2-ctx1024

Finetuned
(14)
this model

Collection including idanylenko/e5-large-v2-ctx1024