Files changed (1) hide show
  1. README.md +241 -232
README.md CHANGED
@@ -1,233 +1,242 @@
1
- ---
2
- base_model: meta-llama/Meta-Llama-3-8B
3
- license: llama3
4
- tags:
5
- - axolotl
6
- - generated_from_trainer
7
- model-index:
8
- - name: Egyptian-Arabic-Translator-Llama-3-8B
9
- results: []
10
- ---
11
-
12
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
13
- <details><summary>See axolotl config</summary>
14
-
15
- axolotl version: `0.4.1`
16
- ```yaml
17
- base_model: meta-llama/Meta-Llama-3-8B
18
- model_type: LlamaForCausalLM
19
- tokenizer_type: AutoTokenizer
20
-
21
- load_in_8bit: true
22
- load_in_4bit: false
23
- strict: false
24
-
25
- datasets:
26
- - path: translation-dataset-v3-train.hf
27
- type: alpaca
28
- train_on_split: train
29
-
30
- test_datasets:
31
- - path: translation-dataset-v3-test.hf
32
- type: alpaca
33
- split: train
34
-
35
- dataset_prepared_path: ./last_run_prepared
36
- output_dir: ./llama_3_translator
37
- hub_model_id: ahmedsamirio/llama_3_translator_v3
38
-
39
-
40
- sequence_len: 2048
41
- sample_packing: true
42
- pad_to_sequence_len: true
43
- eval_sample_packing: false
44
-
45
- adapter: lora
46
- lora_r: 32
47
- lora_alpha: 16
48
- lora_dropout: 0.05
49
- lora_target_linear: true
50
- lora_fan_in_fan_out:
51
- lora_target_modules:
52
- - gate_proj
53
- - down_proj
54
- - up_proj
55
- - q_proj
56
- - v_proj
57
- - k_proj
58
- - o_proj
59
-
60
- wandb_project: en_eg_translator
61
- wandb_entity: ahmedsamirio
62
- wandb_name: llama_3_en_eg_translator_v3
63
-
64
- gradient_accumulation_steps: 4
65
- micro_batch_size: 2
66
- num_epochs: 2
67
- optimizer: paged_adamw_32bit
68
- lr_scheduler: cosine
69
- learning_rate: 2e-5
70
-
71
- train_on_inputs: false
72
- group_by_length: false
73
- bf16: auto
74
- fp16:
75
- tf32: false
76
-
77
- gradient_checkpointing: true
78
- early_stopping_patience:
79
- resume_from_checkpoint:
80
- local_rank:
81
- logging_steps: 1
82
- xformers_attention:
83
- flash_attention: true
84
-
85
- warmup_steps: 10
86
- evals_per_epoch: 10
87
- eval_table_size:
88
- eval_max_new_tokens: 128
89
- saves_per_epoch: 1
90
- debug:
91
- deepspeed:
92
- weight_decay: 0.0
93
- fsdp:
94
- fsdp_config:
95
- special_tokens:
96
- pad_token: <|end_of_text|>
97
- ```
98
-
99
- </details><br>
100
-
101
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/ahmedsamirio/en_eg_translator/runs/hwzxxt0r)
102
-
103
- # Egyptian Arabic Translator Llama-3 8B
104
-
105
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the [ahmedsamirio/oasst2-9k-translation](https://huggingface.co/datasets/ahmedsamirio/oasst2-9k-translation) dataset.
106
-
107
- ## Model description
108
-
109
- This model is an attempt to create a small translation model from English to Egyptian Arabic.
110
-
111
- ## Intended uses & limitations
112
-
113
- - Translating instruction finetuning and text generation datasets
114
-
115
- ## Inference code
116
-
117
- ```python
118
- from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
119
-
120
- tokenizer = AutoTokenizer.from_pretrained("ahmedsamirio/Egyptian-Arabic-Translator-Llama-3-8B")
121
- model = AutoModelForCausalLM.from_pretrained("ahmedsamirio/Egyptian-Arabic-Translator-Llama-3-8B")
122
- pipe = pipeline(task='text-generation', model=model, tokenizer=tokenizer)
123
-
124
-
125
- en_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
126
-
127
- ### Instruction:
128
- Translate the following text to English.
129
-
130
- ### Input:
131
- {text}
132
-
133
- ### Response:
134
- """
135
-
136
- ar_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
137
-
138
- ### Instruction:
139
- Translate the following text to Arabic.
140
-
141
- ### Input:
142
- {text}
143
-
144
- ### Response:
145
- """
146
-
147
- eg_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
148
-
149
- ### Instruction:
150
- Translate the following text to Egyptian Arabic.
151
-
152
- ### Input:
153
- {text}
154
-
155
- ### Response:
156
- """
157
-
158
- text = """Some habits are known as "keystone habits," and these influence the formation of other habits. \
159
- For example, identifying as the type of person who takes care of their body and is in the habit of exercising regularly, \
160
- can also influence eating better and using credit cards less. In business, \
161
- safety can be a keystone habit that influences other habits that result in greater productivity.[17]"""
162
-
163
- ar_text = pipe(ar_template.format(text=text),
164
- max_new_tokens=256,
165
- do_sample=True,
166
- temperature=0.3,
167
- top_p=0.5)
168
-
169
-
170
- eg_text = pipe(eg_template.format(text=ar_text),
171
- max_new_tokens=256,
172
- do_sample=True,
173
- temperature=0.3,
174
- top_p=0.5)
175
-
176
- print("Original Text:" text)
177
- print("\nArabic Translation:", ar_text)
178
- print("\nEgyptian Arabic Translation:", eg_text)
179
- ```
180
-
181
- ## Training and evaluation data
182
-
183
- [ahmedsamirio/oasst2-9k-translation](https://huggingface.co/datasets/ahmedsamirio/oasst2-9k-translation)
184
-
185
- ## Training procedure
186
-
187
- ### Training hyperparameters
188
-
189
- The following hyperparameters were used during training:
190
- - learning_rate: 2e-05
191
- - train_batch_size: 2
192
- - eval_batch_size: 2
193
- - seed: 42
194
- - gradient_accumulation_steps: 4
195
- - total_train_batch_size: 8
196
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
197
- - lr_scheduler_type: cosine
198
- - lr_scheduler_warmup_steps: 10
199
- - num_epochs: 2
200
-
201
- ### Training results
202
-
203
- | Training Loss | Epoch | Step | Validation Loss |
204
- |:-------------:|:------:|:----:|:---------------:|
205
- | 0.9661 | 0.0008 | 1 | 1.3816 |
206
- | 0.5611 | 0.1002 | 123 | 0.9894 |
207
- | 0.6739 | 0.2004 | 246 | 0.8820 |
208
- | 0.5168 | 0.3006 | 369 | 0.8229 |
209
- | 0.5582 | 0.4008 | 492 | 0.7931 |
210
- | 0.552 | 0.5010 | 615 | 0.7814 |
211
- | 0.5129 | 0.6012 | 738 | 0.7591 |
212
- | 0.5887 | 0.7014 | 861 | 0.7444 |
213
- | 0.6359 | 0.8016 | 984 | 0.7293 |
214
- | 0.613 | 0.9018 | 1107 | 0.7179 |
215
- | 0.5671 | 1.0020 | 1230 | 0.7126 |
216
- | 0.4956 | 1.0847 | 1353 | 0.7034 |
217
- | 0.5055 | 1.1849 | 1476 | 0.6980 |
218
- | 0.4863 | 1.2851 | 1599 | 0.6877 |
219
- | 0.4538 | 1.3853 | 1722 | 0.6845 |
220
- | 0.4362 | 1.4855 | 1845 | 0.6803 |
221
- | 0.4291 | 1.5857 | 1968 | 0.6834 |
222
- | 0.6208 | 1.6859 | 2091 | 0.6830 |
223
- | 0.582 | 1.7862 | 2214 | 0.6781 |
224
- | 0.5001 | 1.8864 | 2337 | 0.6798 |
225
-
226
-
227
- ### Framework versions
228
-
229
- - PEFT 0.11.1
230
- - Transformers 4.42.3
231
- - Pytorch 2.1.2+cu118
232
- - Datasets 2.19.1
 
 
 
 
 
 
 
 
 
233
  - Tokenizers 0.19.1
 
1
+ ---
2
+ base_model: meta-llama/Meta-Llama-3-8B
3
+ license: llama3
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: Egyptian-Arabic-Translator-Llama-3-8B
9
+ results: []
10
+ datasets:
11
+ - proj-persona/PersonaHub
12
+ language:
13
+ - ar
14
+ - en
15
+ metrics:
16
+ - accuracy
17
+ library_name: adapter-transformers
18
+ pipeline_tag: text-classification
19
+ ---
20
+
21
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
22
+ <details><summary>See axolotl config</summary>
23
+
24
+ axolotl version: `0.4.1`
25
+ ```yaml
26
+ base_model: meta-llama/Meta-Llama-3-8B
27
+ model_type: LlamaForCausalLM
28
+ tokenizer_type: AutoTokenizer
29
+
30
+ load_in_8bit: true
31
+ load_in_4bit: false
32
+ strict: false
33
+
34
+ datasets:
35
+ - path: translation-dataset-v3-train.hf
36
+ type: alpaca
37
+ train_on_split: train
38
+
39
+ test_datasets:
40
+ - path: translation-dataset-v3-test.hf
41
+ type: alpaca
42
+ split: train
43
+
44
+ dataset_prepared_path: ./last_run_prepared
45
+ output_dir: ./llama_3_translator
46
+ hub_model_id: ahmedsamirio/llama_3_translator_v3
47
+
48
+
49
+ sequence_len: 2048
50
+ sample_packing: true
51
+ pad_to_sequence_len: true
52
+ eval_sample_packing: false
53
+
54
+ adapter: lora
55
+ lora_r: 32
56
+ lora_alpha: 16
57
+ lora_dropout: 0.05
58
+ lora_target_linear: true
59
+ lora_fan_in_fan_out:
60
+ lora_target_modules:
61
+ - gate_proj
62
+ - down_proj
63
+ - up_proj
64
+ - q_proj
65
+ - v_proj
66
+ - k_proj
67
+ - o_proj
68
+
69
+ wandb_project: en_eg_translator
70
+ wandb_entity: ahmedsamirio
71
+ wandb_name: llama_3_en_eg_translator_v3
72
+
73
+ gradient_accumulation_steps: 4
74
+ micro_batch_size: 2
75
+ num_epochs: 2
76
+ optimizer: paged_adamw_32bit
77
+ lr_scheduler: cosine
78
+ learning_rate: 2e-5
79
+
80
+ train_on_inputs: false
81
+ group_by_length: false
82
+ bf16: auto
83
+ fp16:
84
+ tf32: false
85
+
86
+ gradient_checkpointing: true
87
+ early_stopping_patience:
88
+ resume_from_checkpoint:
89
+ local_rank:
90
+ logging_steps: 1
91
+ xformers_attention:
92
+ flash_attention: true
93
+
94
+ warmup_steps: 10
95
+ evals_per_epoch: 10
96
+ eval_table_size:
97
+ eval_max_new_tokens: 128
98
+ saves_per_epoch: 1
99
+ debug:
100
+ deepspeed:
101
+ weight_decay: 0.0
102
+ fsdp:
103
+ fsdp_config:
104
+ special_tokens:
105
+ pad_token: <|end_of_text|>
106
+ ```
107
+
108
+ </details><br>
109
+
110
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/ahmedsamirio/en_eg_translator/runs/hwzxxt0r)
111
+
112
+ # Egyptian Arabic Translator Llama-3 8B
113
+
114
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the [ahmedsamirio/oasst2-9k-translation](https://huggingface.co/datasets/ahmedsamirio/oasst2-9k-translation) dataset.
115
+
116
+ ## Model description
117
+
118
+ This model is an attempt to create a small translation model from English to Egyptian Arabic.
119
+
120
+ ## Intended uses & limitations
121
+
122
+ - Translating instruction finetuning and text generation datasets
123
+
124
+ ## Inference code
125
+
126
+ ```python
127
+ from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
128
+
129
+ tokenizer = AutoTokenizer.from_pretrained("ahmedsamirio/Egyptian-Arabic-Translator-Llama-3-8B")
130
+ model = AutoModelForCausalLM.from_pretrained("ahmedsamirio/Egyptian-Arabic-Translator-Llama-3-8B")
131
+ pipe = pipeline(task='text-generation', model=model, tokenizer=tokenizer)
132
+
133
+
134
+ en_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
135
+
136
+ ### Instruction:
137
+ Translate the following text to English.
138
+
139
+ ### Input:
140
+ {text}
141
+
142
+ ### Response:
143
+ """
144
+
145
+ ar_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
146
+
147
+ ### Instruction:
148
+ Translate the following text to Arabic.
149
+
150
+ ### Input:
151
+ {text}
152
+
153
+ ### Response:
154
+ """
155
+
156
+ eg_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
157
+
158
+ ### Instruction:
159
+ Translate the following text to Egyptian Arabic.
160
+
161
+ ### Input:
162
+ {text}
163
+
164
+ ### Response:
165
+ """
166
+
167
+ text = """Some habits are known as "keystone habits," and these influence the formation of other habits. \
168
+ For example, identifying as the type of person who takes care of their body and is in the habit of exercising regularly, \
169
+ can also influence eating better and using credit cards less. In business, \
170
+ safety can be a keystone habit that influences other habits that result in greater productivity.[17]"""
171
+
172
+ ar_text = pipe(ar_template.format(text=text),
173
+ max_new_tokens=256,
174
+ do_sample=True,
175
+ temperature=0.3,
176
+ top_p=0.5)
177
+
178
+
179
+ eg_text = pipe(eg_template.format(text=ar_text),
180
+ max_new_tokens=256,
181
+ do_sample=True,
182
+ temperature=0.3,
183
+ top_p=0.5)
184
+
185
+ print("Original Text:" text)
186
+ print("\nArabic Translation:", ar_text)
187
+ print("\nEgyptian Arabic Translation:", eg_text)
188
+ ```
189
+
190
+ ## Training and evaluation data
191
+
192
+ [ahmedsamirio/oasst2-9k-translation](https://huggingface.co/datasets/ahmedsamirio/oasst2-9k-translation)
193
+
194
+ ## Training procedure
195
+
196
+ ### Training hyperparameters
197
+
198
+ The following hyperparameters were used during training:
199
+ - learning_rate: 2e-05
200
+ - train_batch_size: 2
201
+ - eval_batch_size: 2
202
+ - seed: 42
203
+ - gradient_accumulation_steps: 4
204
+ - total_train_batch_size: 8
205
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
206
+ - lr_scheduler_type: cosine
207
+ - lr_scheduler_warmup_steps: 10
208
+ - num_epochs: 2
209
+
210
+ ### Training results
211
+
212
+ | Training Loss | Epoch | Step | Validation Loss |
213
+ |:-------------:|:------:|:----:|:---------------:|
214
+ | 0.9661 | 0.0008 | 1 | 1.3816 |
215
+ | 0.5611 | 0.1002 | 123 | 0.9894 |
216
+ | 0.6739 | 0.2004 | 246 | 0.8820 |
217
+ | 0.5168 | 0.3006 | 369 | 0.8229 |
218
+ | 0.5582 | 0.4008 | 492 | 0.7931 |
219
+ | 0.552 | 0.5010 | 615 | 0.7814 |
220
+ | 0.5129 | 0.6012 | 738 | 0.7591 |
221
+ | 0.5887 | 0.7014 | 861 | 0.7444 |
222
+ | 0.6359 | 0.8016 | 984 | 0.7293 |
223
+ | 0.613 | 0.9018 | 1107 | 0.7179 |
224
+ | 0.5671 | 1.0020 | 1230 | 0.7126 |
225
+ | 0.4956 | 1.0847 | 1353 | 0.7034 |
226
+ | 0.5055 | 1.1849 | 1476 | 0.6980 |
227
+ | 0.4863 | 1.2851 | 1599 | 0.6877 |
228
+ | 0.4538 | 1.3853 | 1722 | 0.6845 |
229
+ | 0.4362 | 1.4855 | 1845 | 0.6803 |
230
+ | 0.4291 | 1.5857 | 1968 | 0.6834 |
231
+ | 0.6208 | 1.6859 | 2091 | 0.6830 |
232
+ | 0.582 | 1.7862 | 2214 | 0.6781 |
233
+ | 0.5001 | 1.8864 | 2337 | 0.6798 |
234
+
235
+
236
+ ### Framework versions
237
+
238
+ - PEFT 0.11.1
239
+ - Transformers 4.42.3
240
+ - Pytorch 2.1.2+cu118
241
+ - Datasets 2.19.1
242
  - Tokenizers 0.19.1