Masked word prediction in 103 languages. To use, input a non-English sentence (you can use Google Translate to get one), replacing one of the words with "". distilbert-base-multilingual-cased finetuned on r/explainlikeimfive subset of ELI5 dataset for English masked language modelling. All knowledge of target language is acquired from pretraining only. Training set size 50,000 examples. Trained for 3 epochs. Mask probability 15%, batch size 8, learning rate 2e-5, weight decay 0.01. Final model perplexity 10.22.