Tom Aarsen
commited on
Commit
•
b1cadbc
1
Parent(s):
cb6a582
Update tokenizer: Set lstrip=True for [MASK]
Browse filesThis allows mask filling with e.g. "The dog [MASK]", whereas that otherwise didn't work as most tokens start with a space in our tokenizer. The model doesn't use double spaces, so it would give odd results like "urn".
- tokenizer_config.json +1 -1
tokenizer_config.json
CHANGED
@@ -258,7 +258,7 @@
|
|
258 |
},
|
259 |
"50284": {
|
260 |
"content": "[MASK]",
|
261 |
-
"lstrip":
|
262 |
"normalized": false,
|
263 |
"rstrip": false,
|
264 |
"single_word": false,
|
|
|
258 |
},
|
259 |
"50284": {
|
260 |
"content": "[MASK]",
|
261 |
+
"lstrip": true,
|
262 |
"normalized": false,
|
263 |
"rstrip": false,
|
264 |
"single_word": false,
|