--- language: - ko - uz - en - ru - zh - ja - km - my - si - tl - th - vi - uz - bn - mn - id - ne - pt tags: - translation - multilingual - korean - uzbek datasets: - custom_parallel_corpus license: mit --- # QWEN2.5-7B-Bnk-7e ## Model Description QWEN2.5-7B-Bnk-5e is a multilingual translation model based on the QWEN 2.5 architecture with 7 billion parameters. It specializes in translating multiple languages to Korean and Uzbek. ## Intended Uses & Limitations The model is designed for translating text from various Asian and European languages to Korean and Uzbek. It can be used for tasks such as: - Multilingual document translation - Cross-lingual information retrieval - Language learning applications - International communication assistance Please note that while the model strives for accuracy, it may not always produce perfect translations, especially for idiomatic expressions or highly context-dependent content. ## Training and Evaluation Data The model was fine-tuned on a diverse dataset of parallel texts covering the supported languages. Evaluation was performed on held-out test sets for each language pair. ## Training Procedure Fine-tuning was performed on the QWEN 2.5 7B base model using custom datasets for the specific language pairs. ## Supported Languages The model supports translation from the following languages to Korean and Uzbek: - uzbek (uz) - Russian (ru) - Thai (th) - Chinese (Simplified) (zh) - Chinese (Traditional) (zh-tw, zh-hant) - Bengali (bn) - Mongolian (mn) - Indonesian (id) - Nepali (ne) - English (en) - Khmer (km) - Portuguese (pt) - Sinhala (si) - Korean (ko) - Tagalog (tl) - Myanar (my) - Vietnamese (vi) - Japanese (ja) ## How to Use ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM model_name = "FINGU-AI/QWEN2.5-7B-Bnk-5e" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) # Example usage source_text = "Hello, how are you?" source_lang = "en" target_lang = "ko" # or "uz" for Uzbek messages = [ {"role": "system", "content": f"""Translate {input_lang} to {output_lang} word by word correctly."""}, {"role": "user", "content": f"""{source_text}"""}, ] # Apply chat template input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to('cuda') outputs = model.generate(input_ids, max_length=100) response = outputs[0][input_ids.shape[-1]:] translated_text = tokenizer.decode(response, skip_special_tokens=True) print(translated_text) ``` ## Performance ## Limitations - The model's performance may vary across different language pairs and domains. - It may struggle with very colloquial or highly specialized text. - The model may not always capture cultural nuances or context-dependent meanings accurately. ## Ethical Considerations - The model should not be used for generating or propagating harmful, biased, or misleading content. - Users should be aware of potential biases in the training data that may affect translations. - The model's outputs should not be considered as certified translations for official or legal purposes without human verification. ## Citation ```bibtex @misc{fingu2023qwen25, author = {FINGU AI and AI Team}, title = {QWEN2.5-7B-Bnk-7e: A Multilingual Translation Model}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub}, howpublished = {\url{https://huggingface.co/FINGU-AI/QWEN2.5-7B-Bnk-5e}} }