Tom Aarsen commited on
Commit
b1cadbc
1 Parent(s): cb6a582

Update tokenizer: Set lstrip=True for [MASK]

Browse files

This allows mask filling with e.g. "The dog [MASK]", whereas that otherwise didn't work as most tokens start with a space in our tokenizer. The model doesn't use double spaces, so it would give odd results like "urn".

Files changed (1) hide show
  1. tokenizer_config.json +1 -1
tokenizer_config.json CHANGED
@@ -258,7 +258,7 @@
258
  },
259
  "50284": {
260
  "content": "[MASK]",
261
- "lstrip": false,
262
  "normalized": false,
263
  "rstrip": false,
264
  "single_word": false,
 
258
  },
259
  "50284": {
260
  "content": "[MASK]",
261
+ "lstrip": true,
262
  "normalized": false,
263
  "rstrip": false,
264
  "single_word": false,