Wrong sentencepiece model

#1
by carolinedockes - opened

@pere Hi, very useful model thank you! However I think there is an issue with the spiece.model file. AutoTokenizer.from_pretrained("pere/nb-nn-translation", use_fast = False) gives a tokenizer with vocab size 250,100 whereas AutoTokenizer.from_pretrained("pere/nb-nn-translation", use_fast = True) gives a vocab size of 50,003. I believe the latter, which doesn't rely on the sentencepiece model directly, is the correct one. Would you be able to upload the right file? Thanks!

carolinedockes changed discussion status to closed
carolinedockes changed discussion status to open

(alternatively removing that file would ensure it doesn't get used by mistake)

Sign up or log in to comment