File size: 3,541 Bytes

---
language:
- ko
- uz
- en
- ru
- zh
- ja
- km
- my
- si
- tl
- th
- vi
- uz
- bn
- mn
- id
- ne
- pt
tags:
- translation
- multilingual
- korean
- uzbek
datasets:
- custom_parallel_corpus
license: mit
---

# QWEN2.5-7B-Bnk-7e

## Model Description

QWEN2.5-7B-Bnk-5e is a multilingual translation model based on the QWEN 2.5 architecture with 7 billion parameters. It specializes in translating multiple languages to Korean and Uzbek.

## Intended Uses & Limitations

The model is designed for translating text from various Asian and European languages to Korean and Uzbek. It can be used for tasks such as:

- Multilingual document translation
- Cross-lingual information retrieval
- Language learning applications
- International communication assistance

Please note that while the model strives for accuracy, it may not always produce perfect translations, especially for idiomatic expressions or highly context-dependent content.

## Training and Evaluation Data

The model was fine-tuned on a diverse dataset of parallel texts covering the supported languages. Evaluation was performed on held-out test sets for each language pair.

## Training Procedure

Fine-tuning was performed on the QWEN 2.5 7B base model using custom datasets for the specific language pairs.

## Supported Languages

The model supports translation from the following languages to Korean and Uzbek:

- uzbek (uz)
- Russian (ru)
- Thai (th)
- Chinese (Simplified) (zh)
- Chinese (Traditional) (zh-tw, zh-hant)
- Bengali (bn)
- Mongolian (mn)
- Indonesian (id)
- Nepali (ne)
- English (en)
- Khmer (km)
- Portuguese (pt)
- Sinhala (si)
- Korean (ko)
- Tagalog (tl)
- Myanar (my)
- Vietnamese (vi)
- Japanese (ja)



## How to Use

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "FINGU-AI/QWEN2.5-7B-Bnk-5e"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Example usage
source_text = "Hello, how are you?"
source_lang = "en"
target_lang = "ko"  # or "uz" for Uzbek

messages = [
        {"role": "system", "content": f"""Translate {input_lang} to {output_lang} word by word correctly."""},
        {"role": "user", "content": f"""{source_text}"""},
    ]
# Apply chat template
input_ids = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to('cuda')

outputs = model.generate(input_ids, max_length=100)
response = outputs[0][input_ids.shape[-1]:]
translated_text = tokenizer.decode(response, skip_special_tokens=True)
print(translated_text)
```
## Performance


## Limitations

- The model's performance may vary across different language pairs and domains.
- It may struggle with very colloquial or highly specialized text.
- The model may not always capture cultural nuances or context-dependent meanings accurately.

## Ethical Considerations

- The model should not be used for generating or propagating harmful, biased, or misleading content.
- Users should be aware of potential biases in the training data that may affect translations.
- The model's outputs should not be considered as certified translations for official or legal purposes without human verification.


## Citation


```bibtex
@misc{fingu2023qwen25,
  author = {FINGU AI and AI Team},
  title = {QWEN2.5-7B-Bnk-7e: A Multilingual Translation Model},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/FINGU-AI/QWEN2.5-7B-Bnk-5e}}
}