Upload folder using huggingface_hub

fabc9f6 verified 4 months ago

20 kB

	---
	license: gemma
	pipeline_tag: text-classification
	tags:
	- transformers
	- sentence-transformers
	language:
	- multilingual
	---

	# Reranker

	More details please refer to our Github: [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/tree/master).

	- [Model List](#model-list)
	- [Usage](#usage)
	- [Fine-tuning](#fine-tune)
	- [Evaluation](#evaluation)
	- [Citation](#citation)

	Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.
	You can get a relevance score by inputting query and passage to the reranker.
	And the score can be mapped to a float value in [0,1] by sigmoid function.

	Here, we introduce a lightweight reranker bge-reranker-v2.5-gemma2-lightweight, which is a multilingual model trained based on gemma2-9b. By integrating token compression capabilities and layerwise reduction, the model can maintain outstanding performance while saving significant resources.

	Our model primarily demonstrates the following capabilities:

	- Lightweight: The model can be made lightweight through token compression, layerwise reduction, or a combination of both.
	- Outstanding performance: The model has achieved new state-of-the-art (SOTA) performance on both BEIR and MIRACL.

	We will release a technical report about lightweight reranker soon with more details.

	------

	You can use bge-reranker-v2.5-gemma2-lightweight with the following different prompts:

	- Predict whether passage B contains an answer to query A.
	- Predict whether passages A and B have the same meaning.
	- Predict whether queries A and B are asking the same thing.
	- Predict whether argument A and counterargument B express contradictory opinions.


	## Model List

	\| Model \| Base model \| Language \| layerwise \| compress ratio \| compress layers \| feature \|
	\|:--------------------------------------------------------------------------\|:--------:\|:-----------------------------------------------------------------------------------------------------------------------------------:\|:----------------------------------------------------------------------------------------------:\|:----------------------------------------------------------------------------------------------:\|:----------------------------------------------------------------------------------------------:\|------------------------------------------------------------------------------------------------\|
	\| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) \| [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) \| Chinese and English \| - \| - \| - \| Lightweight reranker model, easy to deploy, with fast inference. \|
	\| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) \| [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) \| Chinese and English \| - \| - \| - \| Lightweight reranker model, easy to deploy, with fast inference. \|
	\| [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) \| [bge-m3](https://huggingface.co/BAAI/bge-m3) \| Multilingual \| - \| - \| - \| Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference. \|
	\| [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) \| [gemma-2b](https://huggingface.co/google/gemma-2b) \| Multilingual \| - \| - \| - \| Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities. \|
	\| [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise) \| [MiniCPM-2B-dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16) \| Multilingual \| 8-40 \| - \| - \| Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference. \|
	\| [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) \| [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) \| Multilingual \| 8-42 \| 1, 2, 4, 8 \| [8, 16, 24, 32, 40] \| Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers, compress ratio and compress layers for output, facilitating accelerated inference. \|


	You can select the model according your senario and resource.
	- For multilingual, utilize [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3), [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) and [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight)

	- For Chinese or English, utilize [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise).

	- For efficiency, utilize [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and the low layer of [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise).

	- For better performance, recommand [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise) and [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma)

	## Usage
	### Using FlagEmbedding

	```
	git clone https://github.com/FlagOpen/FlagEmbedding.git
	cd FlagEmbedding
	pip install -e .
	```

	#### For LLM-based lightweight reranker

	```python
	from FlagEmbedding import LightWeightFlagLLMReranker
	reranker = LightWeightFlagLLMReranker('BAAI/bge-reranker-v2.5-gemma2-lightweight', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation

	score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layer=[24, 40]) # Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
	print(score)

	scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutoff_layers=[28], compress_ratio=2, compress_layer=[24, 40])
	print(scores)
	```

	### Using Huggingface transformers

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	def last_logit_pool(logits: torch.Tensor,
	attention_mask: torch.Tensor) -> torch.Tensor:
	left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
	if left_padding:
	return logits[:, -1]
	else:
	sequence_lengths = attention_mask.sum(dim=1) - 1
	batch_size = logits.shape[0]
	return torch.stack([logits[i, sequence_lengths[i]] for i in range(batch_size)], dim=0)

	def get_inputs(pairs, tokenizer, prompt=None, max_length=1024):
	if prompt is None:
	prompt = "Predict whether passage B contains an answer to query A."
	sep = "\n"
	prompt_inputs = tokenizer(prompt,
	return_tensors=None,
	add_special_tokens=False)['input_ids']
	sep_inputs = tokenizer(sep,
	return_tensors=None,
	add_special_tokens=False)['input_ids']
	inputs = []
	query_lengths = []
	prompt_lengths = []
	for query, passage in pairs:
	query_inputs = tokenizer(f'A: {query}',
	return_tensors=None,
	add_special_tokens=False,
	max_length=max_length * 3 // 4,
	truncation=True)
	passage_inputs = tokenizer(f'B: {passage}',
	return_tensors=None,
	add_special_tokens=False,
	max_length=max_length,
	truncation=True)
	item = tokenizer.prepare_for_model(
	[tokenizer.bos_token_id] + query_inputs['input_ids'],
	sep_inputs + passage_inputs['input_ids'],
	truncation='only_second',
	max_length=max_length,
	padding=False,
	return_attention_mask=False,
	return_token_type_ids=False,
	add_special_tokens=False
	)
	item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs
	item['attention_mask'] = [1] * len(item['input_ids'])
	inputs.append(item)
	query_lengths.append(len([tokenizer.bos_token_id] + query_inputs['input_ids'] + sep_inputs))
	prompt_lengths.append(len(sep_inputs + prompt_inputs))

	return tokenizer.pad(
	inputs,
	padding=True,
	max_length=max_length + len(sep_inputs) + len(prompt_inputs),
	pad_to_multiple_of=8,
	return_tensors='pt',
	), query_lengths, prompt_lengths

	tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2.5-gemma2-lightweight', trust_remote_code=True)
	tokenizer.padding_side = 'right'
	model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2.5-gemma2-lightweight', trust_remote_code=True)
	model = model.to('cuda')
	model.eval()

	pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
	with torch.no_grad():
	inputs, query_lengths, prompt_lengths = get_inputs(pairs, tokenizer)
	inputs = inputs.to(model.device)
	outputs = model(**inputs,
	return_dict=True,
	cutoff_layers=[28],
	compress_ratio=2,
	compress_layer=[24, 40],
	query_lengths=query_lengths,
	prompt_lengths=prompt_lengths)
	scores = []
	for i in range(len(outputs.logits)):
	logits = last_logit_pool(outputs.logits[i], outputs.attention_masks[i])
	scores.append(logits.cpu().float().tolist())
	print(scores)
	```

	## Load model in local

	1. make sure `gemma_config.py` and `gemma_model.py` from [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight/tree/main) in your local path.
	2. modify the following part of config.json:
	```
	"auto_map": {
	"AutoConfig": "gemma_config.CostWiseGemmaConfig",
	"AutoModel": "gemma_model.CostWiseGemmaModel",
	"AutoModelForCausalLM": "gemma_model.CostWiseGemmaForCausalLM"
	},
	```

	## Evaluation

	The configuration of saving 60% Flops is: `compress_ratios=2`, `compress_layer=[8]`, `cutoff_layers=[25]`.

	- BEIR:

	\| BEIR \| bge-large-en-v1.5 \| Bge-rearanker v2 m3 \| jina-reranker-v2-base-multilingual \| bge-reranker-v2-gemma \| bge-reranker-v2.5-gemma2-lightweight \| bge-reranker-v2.5-gemma2-lightweight \|
	\| :----------------: \| :---------------: \| :-----------------: \| :--------------------------------: \| :-------------------: \| :----------------------------------: \| :----------------------------------: \|
	\| Save Flops \| - \| - \| - \| - \| 60% \| 0 \|
	\| ArguAna \| 63.54 \| 37.7 \| 52.23 \| 78.68 \| 86.04 \| 86.16 \|
	\| ClimateFEVER \| 36.49 \| 37.99 \| 34.65 \| 39.07 \| 48.41 \| 48.48 \|
	\| CQA \| 42.23 \| 38.24 \| 40.21 \| 45.85 \| 49.18 \| 48.9 \|
	\| DBPedia \| 44.16 \| 48.15 \| 49.31 \| 49.92 \| 51.98 \| 52.11 \|
	\| FEVER \| 87.17 \| 90.15 \| 92.44 \| 90.15 \| 94.71 \| 94.69 \|
	\| FiQA2018 \| 44.97 \| 49.32 \| 45.88 \| 49.32 \| 60.48 \| 60.95 \|
	\| HotpotQA \| 74.11 \| 84.51 \| 81.81 \| 86.15 \| 87.84 \| 87.89 \|
	\| MSMARCO \| 42.48 \| 47.79 \| 47.83 \| 48.07 \| 47.23 \| 47.26 \|
	\| NFCorpus \| 38.12 \| 34.85 \| 37.73 \| 39.73 \| 41.4 \| 41.64 \|
	\| NQ \| 55.04 \| 69.37 \| 67.35 \| 72.6 \| 75.37 \| 75.58 \|
	\| QuoraRetrieval \| 89.06 \| 89.13 \| 87.81 \| 90.37 \| 91.25 \| 91.18 \|
	\| SCIDOCS \| 22.62 \| 18.25 \| 20.21 \| 21.65 \| 23.71 \| 23.87 \|
	\| SciFact \| 74.64 \| 73.08 \| 76.93 \| 77.22 \| 80.5 \| 80.38 \|
	\| Touche2020 \| 25.08 \| 35.68 \| 32.45 \| 35.68 \| 30.64 \| 31.09 \|
	\| TRECCOVID \| 74.89 \| 83.39 \| 80.89 \| 85.51 \| 84.26 \| 84.85 \|
	\| Mean \| 54.31 \| 55.36 \| 56.52 \| 60.71 \| 63.1 \| 63.67 \|

	\| BEIR \| e5-mistral-7b-instruct \| bge-reranker-v2-gemma \| bge-reranker-v2.5-gemma-lightweight \| bge-reranker-v2.5-gemma-lightweight \|
	\| :----------------: \| :--------------------: \| :-------------------: \| :---------------------------------: \| :---------------------------------: \|
	\| Save Flops \| - \| - \| 60% \| 0 \|
	\| ArguAna \| 61.8 \| 79.05 \| 86.02 \| 86.58 \|
	\| ClimateFEVER \| 38.37 \| 37.66 \| 47.27 \| 47.13 \|
	\| CQA \| 42.97 \| 46.16 \| 49.06 \| 49.53 \|
	\| DBPedia \| 48.84 \| 50.77 \| 52.45 \| 52.87 \|
	\| FEVER \| 87.82 \| 91.36 \| 94.85 \| 95.19 \|
	\| FiQA2018 \| 56.58 \| 50.96 \| 58.81 \| 61.19 \|
	\| HotpotQA \| 75.72 \| 86.99 \| 88.49 \| 88.82 \|
	\| MSMARCO \| 43.06 \| 48.35 \| 47.65 \| 47.4 \|
	\| NFCorpus \| 38.58 \| 39.25 \| 42.28 \| 42.17 \|
	\| NQ \| 63.56 \| 73.44 \| 75 \| 76.28 \|
	\| QuoraRetrieval \| 89.59 \| 90.44 \| 91.09 \| 91.18 \|
	\| SCIDOCS \| 16.3 \| 20.77 \| 22.2 \| 22.69 \|
	\| SciFact \| 76.26 \| 77.78 \| 79.94 \| 80.98 \|
	\| Touche2020 \| 26.24 \| 35.79 \| 28.69 \| 31.17 \|
	\| TRECCOVID \| 87.07 \| 88.13 \| 86.61 \| 87.36 \|
	\| Mean \| 56.85 \| 61.13 \| 63.36 \| 64.04 \|

	- MIRACL:

	\| MIRACL (dev, nDCG@10) \| Average (18) \| save flops \| ar \| bn \| en \| es \| fa \| fi \| fr \| hi \| id \| ja \| ko \| ru \| sw \| te \| th \| zh \| de \| yo \|
	\| :--------------------------------------: \| :----------: \| :--------: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \| :--: \|
	\| bge-m3 (Dense) \| 69.2 \| - \| 78.4 \| 80.0 \| 56.9 \| 56.1 \| 60.9 \| 78.6 \| 58.3 \| 59.5 \| 56.1 \| 72.8 \| 69.9 \| 70.1 \| 78.7 \| 86.2 \| 82.6 \| 62.7 \| 56.7 \| 81.8 \|
	\| jina-reranker-v2-base-multilingual \| 69.6 \| - \| 73.4 \| 81.9 \| 58.9 \| 58.6 \| 60.5 \| 77.2 \| 56.1 \| 62.7 \| 59.6 \| 72.7 \| 74.0 \| 67.1 \| 78.1 \| 85.8 \| 81.2 \| 63.0 \| 58.2 \| 84.2 \|
	\| bge-reranker-v2-m3 \| 74.4 \| - \| 81.7 \| 84.6 \| 63.5 \| 64.4 \| 65.7 \| 82.4 \| 63.7 \| 68.5 \| 62.7 \| 80.0 \| 73.8 \| 76.9 \| 82.3 \| 89.4 \| 85.3 \| 65.2 \| 62.7 \| 87.4 \|
	\| bge-reranker-v2-gemma \| 75.0 \| - \| 82.3 \| 85.0 \| 66.6 \| 65.3 \| 65.5 \| 82.6 \| 65.4 \| 69.4 \| 61.2 \| 79.7 \| 75.1 \| 78.3 \| 81.8 \| 89.6 \| 86.1 \| 66.8 \| 64.0 \| 85.9 \|
	\| bge-reranker-v2.5-gemma2-lightweight \| 77.1 \| 60% \| 82.5 \| 87.8 \| 68.6 \| 67.6 \| 67.5 \| 82.8 \| 68.5 \| 71.4 \| 63.8 \| 82.8 \| 75.9 \| 79.8 \| 84.8 \| 90.8 \| 88.1 \| 69.9 \| 65.8 \| 89.6 \|
	\| bge-reranker-v2.5-gemma-lightweight \| 77.3 \| 0 \| 82.8 \| 87.6 \| 69.3 \| 67.8 \| 67.4 \| 83.3 \| 68.5 \| 71.3 \| 63.8 \| 83.6 \| 75.7 \| 80.1 \| 85.1 \| 90.8 \| 88.7 \| 69.9 \| 65.6 \| 89.8 \|



	## Citation

	If you find this repository useful, please consider giving a star and citation

	```bibtex
	@misc{li2023making,
	title={Making Large Language Models A Better Foundation For Dense Retrieval},
	author={Chaofan Li and Zheng Liu and Shitao Xiao and Yingxia Shao},
	year={2023},
	eprint={2312.15503},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	@misc{chen2024bge,
	title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
	author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
	year={2024},
	eprint={2402.03216},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```