FINGU-AI commited on
Commit
8fdb923
1 Parent(s): 49a889e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -3
README.md CHANGED
@@ -1,3 +1,129 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ko
4
+ - uz
5
+ - en
6
+ - ru
7
+ - zh
8
+ - ja
9
+ - km
10
+ - my
11
+ - si
12
+ - tl
13
+ - th
14
+ - vi
15
+ - kk
16
+ - bn
17
+ - mn
18
+ - id
19
+ - ne
20
+ - pt
21
+ tags:
22
+ - translation
23
+ - multilingual
24
+ - korean
25
+ - uzbek
26
+ datasets:
27
+ - custom_parallel_corpus
28
+ license: mit
29
+ ---
30
+
31
+ # QWEN2.5-7B-Bnk-7e
32
+
33
+ ## Model Description
34
+
35
+ QWEN2.5-7B-Bnk-5e is a multilingual translation model based on the QWEN 2.5 architecture with 7 billion parameters. It specializes in translating multiple languages to Korean and Uzbek.
36
+
37
+ ## Intended Uses & Limitations
38
+
39
+ The model is designed for translating text from various Asian and European languages to Korean and Uzbek. It can be used for tasks such as:
40
+
41
+ - Multilingual document translation
42
+ - Cross-lingual information retrieval
43
+ - Language learning applications
44
+ - International communication assistance
45
+
46
+ Please note that while the model strives for accuracy, it may not always produce perfect translations, especially for idiomatic expressions or highly context-dependent content.
47
+
48
+ ## Training and Evaluation Data
49
+
50
+ The model was fine-tuned on a diverse dataset of parallel texts covering the supported languages. Evaluation was performed on held-out test sets for each language pair.
51
+
52
+ ## Training Procedure
53
+
54
+ Fine-tuning was performed on the QWEN 2.5 7B base model using custom datasets for the specific language pairs.
55
+
56
+ ## Supported Languages
57
+
58
+ The model supports translation from the following languages to Korean and Uzbek:
59
+
60
+ - Kazakh (kk)
61
+ - Russian (ru)
62
+ - Thai (th)
63
+ - Chinese (Simplified) (zh)
64
+ - Chinese (Traditional) (zh-tw, zh-hant)
65
+ - Bengali (bn)
66
+ - Mongolian (mn)
67
+ - Indonesian (id)
68
+ - Nepali (ne)
69
+ - English (en)
70
+ - Khmer (km)
71
+ - Portuguese (pt)
72
+ - Sinhala (si)
73
+ - Korean (ko)
74
+ - Tagalog (tl)
75
+ - Burmese (my)
76
+ - Vietnamese (vi)
77
+ - Japanese (ja)
78
+
79
+
80
+
81
+ ## How to Use
82
+
83
+ ```python
84
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
85
+
86
+ model_name = "FINGU-AI/QWEN2.5-7B-Bnk-5e"
87
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
88
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
89
+
90
+ # Example usage
91
+ source_text = "Hello, how are you?"
92
+ source_lang = "en"
93
+ target_lang = "ko" # or "uz" for Uzbek
94
+
95
+ input_text = f"Translate from {source_lang} to {target_lang}: {source_text}"
96
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids
97
+
98
+ outputs = model.generate(input_ids, max_length=100)
99
+ translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
100
+ print(translated_text)
101
+ ```
102
+ ## Performance
103
+
104
+
105
+ ## Limitations
106
+
107
+ - The model's performance may vary across different language pairs and domains.
108
+ - It may struggle with very colloquial or highly specialized text.
109
+ - The model may not always capture cultural nuances or context-dependent meanings accurately.
110
+
111
+ ## Ethical Considerations
112
+
113
+ - The model should not be used for generating or propagating harmful, biased, or misleading content.
114
+ - Users should be aware of potential biases in the training data that may affect translations.
115
+ - The model's outputs should not be considered as certified translations for official or legal purposes without human verification.
116
+
117
+
118
+ ## Citation
119
+
120
+
121
+ ```bibtex
122
+ @misc{fingu2023qwen25,
123
+ author = {FINGU AI and AI Team},
124
+ title = {QWEN2.5-7B-Bnk-7e: A Multilingual Translation Model},
125
+ year = {2024},
126
+ publisher = {Hugging Face},
127
+ journal = {Hugging Face Model Hub},
128
+ howpublished = {\url{https://huggingface.co/FINGU-AI/QWEN2.5-7B-Bnk-5e}}
129
+ }