Update tokenizer_config.json

Without setting `model_max_length`, it breaks with HF's pipeline function when using >512 tokens:
```py
from transformers import pipeline
pipe = pipeline('feature-extraction', 'thenlper/gte-small')
input = "# 2024 Summer Olympics\n\n## The Games\\[edit\\]\n\n### Sports\\[edit\\]\n\nle\">Image Basketball<ul><li>Basketball (2)</li><li>3×3 basketball (2)</li></ul></li><li>Image Boxing (13)</li><li>Image Breaking (2)</li></ul></td><td><ul><li>Image Canoeing<ul><li>Slalom (6)</li><li>Sprint (10)</li></ul></li><li>Image Cycling<ul><li>BMX freestyle (2)</li><li>BMX racing (2)</li><li>Mountain biking (2)</li><li>Road (4)</li><li>Track (12)</li></ul></li><li>Image Equestrian<ul><li>Dressage (2)</li><li>Eventing (2)</li><li>Jumping (2)</li></ul></li><li>Image<a"
output = pipe(input)
```

See https://github.com/xenova/transformers.js/issues/355 for more information. This modification was also made to the Transformers.js-compatible version of the model: https://huggingface.co/Xenova/gte-small/commit/7ca943b8ff118ce9eb87aa3a5669f26e3d633fd7

Files changed (1) hide show

tokenizer_config.json +1 -1

tokenizer_config.json CHANGED Viewed

@@ -4,7 +4,7 @@
   "do_basic_tokenize": true,
   "do_lower_case": true,
   "mask_token": "[MASK]",
-  "model_max_length": 1000000000000000019884624838656,
   "never_split": null,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",

   "do_basic_tokenize": true,
   "do_lower_case": true,
   "mask_token": "[MASK]",
+  "model_max_length": 512,
   "never_split": null,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",