Pravopysnyk/best-unlp · Hugging Face

This model was trained by Pravopysnyk team for the Ukrainian NLP shared task in Ukrainian grammar correction. The model is MBart-50-large set to ukr-to-ukr translation task finetuned on UA-GEC augmented by custom dataset generated using our synthetic error generation. The code for error generation will be uploaded on github soon, and the detailed procedure is described in our paper. For this model we added to ua-gec:- 5k sentences generated by round translaion (ukr-rus-ukr) 10k sentences using our punctuation error generation script 2k of dilution (just fully correct sentences sampled from our dataset) 10k of russism (errors generated using our russism error generation).

Hope you find this description helpful!