|
--- |
|
base_model: BAAI/bge-reranker-v2-m3 |
|
language: |
|
- en |
|
- ru |
|
license: mit |
|
pipeline_tag: text-classification |
|
tags: |
|
- transformers |
|
- sentence-transformers |
|
- text-embeddings-inference |
|
--- |
|
|
|
|
|
# Model for English and Russian |
|
|
|
This is a truncated version of [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3). |
|
|
|
This model has only English and Russian tokens left in the vocabulary. Thus making it 1.5 smaller than the original model while producing the same embeddings. |
|
|
|
The model has been truncated in [this notebook](https://colab.research.google.com/drive/19IFjWpJpxQie1gtHSvDeoKk7CQtpy6bT?usp=sharing). |
|
|
|
## FAQ |
|
|
|
|
|
### Generate Scores for text |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained('qilowoq/bge-reranker-v2-m3-en-ru') |
|
model = AutoModelForSequenceClassification.from_pretrained('qilowoq/bge-reranker-v2-m3-en-ru') |
|
model.eval() |
|
|
|
pairs = [('How many people live in Berlin?', 'Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.'), |
|
('Какая площадь Берлина?', 'Площадь Берлина составляет 891,8 квадратных километров.')] |
|
with torch.no_grad(): |
|
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt') |
|
scores = model(**inputs, return_dict=True).logits.view(-1, ).float() |
|
print(scores) |
|
``` |
|
|
|
|
|
## Citation |
|
|
|
If you find this repository useful, please consider giving a star and citation |
|
|
|
```bibtex |
|
@misc{li2023making, |
|
title={Making Large Language Models A Better Foundation For Dense Retrieval}, |
|
author={Chaofan Li and Zheng Liu and Shitao Xiao and Yingxia Shao}, |
|
year={2023}, |
|
eprint={2312.15503}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
@misc{chen2024bge, |
|
title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation}, |
|
author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu}, |
|
year={2024}, |
|
eprint={2402.03216}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
``` |