SaulLu commited on
Commit
91b871b
1 Parent(s): cec6759

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You need a custom version of the `tokenizers` library to use this tokenizer.
2
+
3
+ To install this custom version you can:
4
+ ```bash
5
+ pip install transformers
6
+ git clone https://github.com/huggingface/tokenizers.git
7
+ cd tokenizers
8
+ git checkout bigscience_fork
9
+ cd bindings/python
10
+ pip install setuptools_rust
11
+ pip install -e .
12
+ ```
13
+
14
+ and then to load it, do:
15
+ ```python
16
+ from transformers import AutoTokenizer
17
+
18
+ tokenizer = AutoTokenizer.from_pretrained("bigscience-catalogue-data-dev/byte-level-bpe-tokenizer-no-norm-250k-whitespace-and-eos-regex-alpha-v3-dedup-lines-articles")
19
+ ```