license: apache-2.0 library_name: transformers

PEG: Towards Robust Text Retrieval with Progressive Learning

Model Details

We propose the PEG model (a Progressively Learned Textual Embedding), which progressively adjusts the weights of samples contributing to the loss within an extremely large batch, based on the difficulty levels of negative samples. we have amassed an extensive collection of over 110 million data, spanning a wide range of fields such as general knowledge, finance, tourism, medicine, and more.

Our technical report is available at Paper

Usage (HuggingFace Transformers)

Install transformers:

pip install transformers

Then load model and predict:

from transformers import AutoModel, AutoTokenizer
import torch


# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('TownsWu/PEG')
model = AutoModel.from_pretrained('TownsWu/PEG')
sentences = ['如何更换花呗绑定银行卡', '花呗更改绑定银行卡']
# Tokenize sentences
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    last_hidden_state = model(**inputs, return_dict=True).last_hidden_state
    embeddings = last_hidden_state[:, 0]
print("embeddings:")
print(embeddings)

Contact

If you have any question or suggestion related to this project, feel free to open an issue or pull request. You also can email Tong Wu(townswu@tencent.com).

Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry:


@article{wu2023towards,
  title={Towards Robust Text Retrieval with Progressive Learning},
  author={Wu, Tong and Qin, Yulei and Zhang, Enwei and Xu, Zihan and Gao, Yuting and Li, Ke and Sun, Xing},
  journal={arXiv preprint arXiv:2311.11691},
  year={2023}
}

Downloads last month: 12

Spaces using TownsWu/PEG 10

Paper for TownsWu/PEG

Towards Robust Text Retrieval with Progressive Learning

Paper • 2311.11691 • Published Nov 20, 2023

Evaluation results

map on MTEB CMedQAv1
test set self-reported

84.091
mrr on MTEB CMedQAv1
test set self-reported

86.629
map on MTEB CMedQAv2
test set self-reported

86.558
mrr on MTEB CMedQAv2
test set self-reported

89.433
map_at_1 on MTEB CmedqaRetrieval
self-reported

26.101
map_at_10 on MTEB CmedqaRetrieval
self-reported

38.239
map_at_100 on MTEB CmedqaRetrieval
self-reported

40.083
map_at_1000 on MTEB CmedqaRetrieval
self-reported

40.205
map_at_3 on MTEB CmedqaRetrieval
self-reported

34.386
map_at_5 on MTEB CmedqaRetrieval
self-reported

36.426
mrr_at_1 on MTEB CmedqaRetrieval
self-reported

39.435
mrr_at_10 on MTEB CmedqaRetrieval
self-reported

46.968
mrr_at_100 on MTEB CmedqaRetrieval
self-reported

47.946
mrr_at_1000 on MTEB CmedqaRetrieval
self-reported

47.997
mrr_at_3 on MTEB CmedqaRetrieval
self-reported

44.803
mrr_at_5 on MTEB CmedqaRetrieval
self-reported

45.911
ndcg_at_1 on MTEB CmedqaRetrieval
self-reported

39.435
ndcg_at_10 on MTEB CmedqaRetrieval
self-reported

44.416
ndcg_at_100 on MTEB CmedqaRetrieval
self-reported

51.773
ndcg_at_1000 on MTEB CmedqaRetrieval
self-reported

53.888