Update tokenizer: Set lstrip=True for [MASK]

This allows mask filling with e.g. "The dog [MASK]", whereas that otherwise didn't work as most tokens start with a space in our tokenizer. The model doesn't use double spaces, so it would give odd results like "urn".

Files changed (1) hide show

tokenizer_config.json +1 -1

tokenizer_config.json CHANGED Viewed

@@ -258,7 +258,7 @@
     },
     "50284": {
       "content": "[MASK]",
-      "lstrip": false,
       "normalized": false,
       "rstrip": false,
       "single_word": false,

     },
     "50284": {
       "content": "[MASK]",
+      "lstrip": true,
       "normalized": false,
       "rstrip": false,
       "single_word": false,