Wrong sentencepiece model

by carolinedockes - opened Jun 1, 2023

Jun 1, 2023

•

edited Jun 1, 2023

@pere Hi, very useful model thank you! However I think there is an issue with the spiece.model file. AutoTokenizer.from_pretrained("pere/nb-nn-translation", use_fast = False) gives a tokenizer with vocab size 250,100 whereas AutoTokenizer.from_pretrained("pere/nb-nn-translation", use_fast = True) gives a vocab size of 50,003. I believe the latter, which doesn't rely on the sentencepiece model directly, is the correct one. Would you be able to upload the right file? Thanks!

carolinedockes changed discussion status to closed Jun 1, 2023

carolinedockes changed discussion status to open Jun 1, 2023

carolinedockes

Jun 1, 2023

(alternatively removing that file would ensure it doesn't get used by mistake)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment