--- license: mit language: - ce - ru metrics: - chrf - bleu base_model: - facebook/nllb-200-distilled-600M pipeline_tag: translation library_name: transformers --- This is fine tuned NLLB-200 model for Chechen-Russian translation, presented in paper [The first open machine translation system for the Chechen language](https://www.arxiv.org/abs/2507.12672). The language token for the Chechen language is `ce_Cyrl`, while for all the other languages included in NLLB-200, the tokens are composed of three letters (i.e. rus_Cyrl for Russian). Here is an example of how the model can be used in the code: ```python import torch from transformers import AutoModelForSeq2SeqLM from transformers import NllbTokenizer model_nllb = AutoModelForSeq2SeqLM.from_pretrained('NM-development/nllb-ce-rus-v0').cuda() tokenizer_nllb = NllbTokenizer.from_pretrained('NM-development/nllb-ce-rus-v0') def translate(text, model, tokenizer, src_lang='rus_Cyrl', tgt_lang='eng_Latn', a=16, b=1.5, max_input_length=1024, **kwargs): model.eval() with torch.no_grad(): tokenizer.src_lang = src_lang tokenizer.tgt_lang = tgt_lang inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=max_input_length) result = model.generate( **inputs.to(model.device), forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang), max_new_tokens=int(a + b * inputs.input_ids.shape[1]), **kwargs ) return tokenizer.batch_decode(result, skip_special_tokens=True) text = "Стигална кӀел къахьоьгуш, ша мел динчу хӀуманах буьсун болу хӀун пайда оьцу адамо?" translate(text, model_nllb, tokenizer_nllb, src_lang='ce_Cyrl', tgt_lang='rus_Cyrl')[0] # 'Что пользы человеку от того, что он трудился под солнцем и что сделал?' ```