Encoders With Extended Context
Collection
A collection of common pretrained sentence transformers with enhanced context window via zero-training position embeddings approximation
•
1 item
•
Updated
This model uses positional embeddings tweak, which allows it to has a context window twice as it's base model (intfloat/e5-large-v2). This is a training-less context improvement, meaning that no fine-tuning was applied to any of the base model parameters. The two models differ only in their positional embeddings. Paper describing applied tweak is coming soon.
The model was evaluated on four retrieval tasks from LongEmbed benchmark and achieved following NDCG@10 metrics (arb. units):
intfloat/e5-large-v2),intfloat/e5-large-v2),intfloat/e5-large-v2),intfloat/e5-large-v2)Steps to reproduce:
from sentence_transformers import SentenceTransformer # 4.0.2
import mteb # 1.38.2
# load model
model = SentenceTransformer('idanylenko/e5-large-v2-ctx1024')
# define tasks
retrieval_task_list = [
"LEMBSummScreenFDRetrieval",
"LEMBQMSumRetrieval",
"LEMBWikimQARetrieval",
"LEMBNarrativeQARetrieval"
]
tasks = mteb.get_tasks(tasks=retrieval_task_list)
# run the evaluation
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model)
Base model
intfloat/e5-large-v2