setting max model length to reasonable number / max_pos_encodings, e.g. 8192
#11
by
michaelfeil
- opened
- README.md +10 -0
- tokenizer_config.json +1 -1
README.md
CHANGED
@@ -172,6 +172,16 @@ with torch.no_grad():
|
|
172 |
print(scores)
|
173 |
```
|
174 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
175 |
## Load model in local
|
176 |
|
177 |
1. make sure `gemma_config.py` and `gemma_model.py` from [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight/tree/main) in your local path.
|
|
|
172 |
print(scores)
|
173 |
```
|
174 |
|
175 |
+
## Infinity:
|
176 |
+
|
177 |
+
For an OpenAI API-compatible local deploment and [Infinity](https://github.com/michaelfeil/infinity)
|
178 |
+
|
179 |
+
```
|
180 |
+
docker run -it --gpus all -v $volume:/app/.cache -p 7997:7997 \
|
181 |
+
michaelf34/infinity:0.0.70 \
|
182 |
+
v2 infinity_emb v2 --model-id BAAI/bge-reranker-v2.5-gemma2-lightweight --device cuda --no-bettertransformer
|
183 |
+
```
|
184 |
+
|
185 |
## Load model in local
|
186 |
|
187 |
1. make sure `gemma_config.py` and `gemma_model.py` from [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight/tree/main) in your local path.
|
tokenizer_config.json
CHANGED
@@ -1746,7 +1746,7 @@
|
|
1746 |
"bos_token": "<bos>",
|
1747 |
"clean_up_tokenization_spaces": false,
|
1748 |
"eos_token": "<eos>",
|
1749 |
-
"model_max_length":
|
1750 |
"pad_token": "<pad>",
|
1751 |
"sp_model_kwargs": {},
|
1752 |
"spaces_between_special_tokens": false,
|
|
|
1746 |
"bos_token": "<bos>",
|
1747 |
"clean_up_tokenization_spaces": false,
|
1748 |
"eos_token": "<eos>",
|
1749 |
+
"model_max_length": 8192,
|
1750 |
"pad_token": "<pad>",
|
1751 |
"sp_model_kwargs": {},
|
1752 |
"spaces_between_special_tokens": false,
|