File size: 3,541 Bytes
8fdb923 41593ad 8fdb923 41593ad 8fdb923 41593ad 8fdb923 cf83eb0 8fdb923 cf83eb0 8fdb923 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
language:
- ko
- uz
- en
- ru
- zh
- ja
- km
- my
- si
- tl
- th
- vi
- uz
- bn
- mn
- id
- ne
- pt
tags:
- translation
- multilingual
- korean
- uzbek
datasets:
- custom_parallel_corpus
license: mit
---
# QWEN2.5-7B-Bnk-7e
## Model Description
QWEN2.5-7B-Bnk-5e is a multilingual translation model based on the QWEN 2.5 architecture with 7 billion parameters. It specializes in translating multiple languages to Korean and Uzbek.
## Intended Uses & Limitations
The model is designed for translating text from various Asian and European languages to Korean and Uzbek. It can be used for tasks such as:
- Multilingual document translation
- Cross-lingual information retrieval
- Language learning applications
- International communication assistance
Please note that while the model strives for accuracy, it may not always produce perfect translations, especially for idiomatic expressions or highly context-dependent content.
## Training and Evaluation Data
The model was fine-tuned on a diverse dataset of parallel texts covering the supported languages. Evaluation was performed on held-out test sets for each language pair.
## Training Procedure
Fine-tuning was performed on the QWEN 2.5 7B base model using custom datasets for the specific language pairs.
## Supported Languages
The model supports translation from the following languages to Korean and Uzbek:
- uzbek (uz)
- Russian (ru)
- Thai (th)
- Chinese (Simplified) (zh)
- Chinese (Traditional) (zh-tw, zh-hant)
- Bengali (bn)
- Mongolian (mn)
- Indonesian (id)
- Nepali (ne)
- English (en)
- Khmer (km)
- Portuguese (pt)
- Sinhala (si)
- Korean (ko)
- Tagalog (tl)
- Myanar (my)
- Vietnamese (vi)
- Japanese (ja)
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "FINGU-AI/QWEN2.5-7B-Bnk-5e"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Example usage
source_text = "Hello, how are you?"
source_lang = "en"
target_lang = "ko" # or "uz" for Uzbek
messages = [
{"role": "system", "content": f"""Translate {input_lang} to {output_lang} word by word correctly."""},
{"role": "user", "content": f"""{source_text}"""},
]
# Apply chat template
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to('cuda')
outputs = model.generate(input_ids, max_length=100)
response = outputs[0][input_ids.shape[-1]:]
translated_text = tokenizer.decode(response, skip_special_tokens=True)
print(translated_text)
```
## Performance
## Limitations
- The model's performance may vary across different language pairs and domains.
- It may struggle with very colloquial or highly specialized text.
- The model may not always capture cultural nuances or context-dependent meanings accurately.
## Ethical Considerations
- The model should not be used for generating or propagating harmful, biased, or misleading content.
- Users should be aware of potential biases in the training data that may affect translations.
- The model's outputs should not be considered as certified translations for official or legal purposes without human verification.
## Citation
```bibtex
@misc{fingu2023qwen25,
author = {FINGU AI and AI Team},
title = {QWEN2.5-7B-Bnk-7e: A Multilingual Translation Model},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/FINGU-AI/QWEN2.5-7B-Bnk-5e}}
}
|