lgris's picture
add tokenizer
90069c8
raw
history blame
421 Bytes
{"<pad>": 0, "|": 1, "<unk>": 2, "a": 3, "b": 4, "c": 5, "d": 6, "e": 7, "f": 8, "g": 9, "h": 10, "i": 11, "j": 12, "k": 13, "l": 14, "m": 15, "n": 16, "o": 17, "p": 18, "q": 19, "r": 20, "s": 21, "t": 22, "u": 23, "v": 24, "w": 25, "x": 26, "y": 27, "z": 28, "ç": 29, "ã": 30, "à": 31, "á": 32, "â": 33, "ê": 34, "é": 35, "í": 36, "ó": 37, "ô": 38, "õ": 39, "ú": 40, "û": 41, "-": 42, "<s>": 43, "</s>": 44}