Iker commited on
Commit
5f6ca9d
1 Parent(s): 02b34d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +168 -0
README.md CHANGED
@@ -22,3 +22,171 @@ base_model: google/gemma-2b
22
 
23
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/614a1ebb8f82f1df64d55126/2i_CasoeJTgQPNoBIfA8E.jpeg)
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/614a1ebb8f82f1df64d55126/2i_CasoeJTgQPNoBIfA8E.jpeg)
24
 
25
+
26
+ # Neurona 2B Beta: Un Modelo de Lenguage en Español
27
+
28
+ > Esta es una versión preliminar del dataset card. El modelo está en desarrollo y no es la versión final. Si quieres saber más sobre este modelo, escribe a iker.garciaf@ehu.eus
29
+
30
+
31
+ Neurona 2B es un modelo de lenguaje en Español. Esta es la primera iteración y un experimento para poner a punto los scripts y la infraestructura.
32
+
33
+ Neurona 2B ha sido entrenado con los siguiente datasets
34
+
35
+ - [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)
36
+ - [Iker/OpenHermes-2.5-Spanish](https://huggingface.co/datasets/Iker/OpenHermes-2.5-Spanish)
37
+ - [Iker/Document-Translation-en-es](https://huggingface.co/datasets/Iker/Document-Translation-en-es)
38
+ - [Iker/InstructTranslation-EN-ES](https://huggingface.co/datasets/Iker/InstructTranslation-EN-ES)
39
+ - [Helsinki-NLP/opus-100 (en-es, only a few examples to reach 1 million instructions)](https://huggingface.co/datasets/Helsinki-NLP/opus-100)
40
+ - [projecte-aina/RAG_Multilingual(es only, 3701 examples)](https://huggingface.co/datasets/projecte-aina/RAG_Multilingual)
41
+ - [glaiveai/glaive-code-assistant-v3](https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3)
42
+ - [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2)
43
+
44
+ Esta mezcla de datasets en Inglés y Español, permite al modelo adquirir diferentes capacidades, como RAG, function calling, code assistant, question answering, summarization... tanto en Inglés como en Español.
45
+
46
+ # Entrenamiento
47
+
48
+ Este modelo se ha entrado usando 4xNvidia A100 80Gb y axolotl
49
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
50
+
51
+ Esta es la configuración usada
52
+
53
+ ```yaml
54
+ base_model: google/gemma-2b
55
+ model_type: AutoModelForCausalLM
56
+ tokenizer_type: AutoTokenizer
57
+ is_falcon_derived_model:
58
+ is_llama_derived_model:
59
+ is_qwen_derived_model:
60
+ is_mistral_derived_model:
61
+
62
+ load_in_8bit: false
63
+ load_in_4bit: false
64
+ strict: false
65
+
66
+ device_map: null
67
+
68
+ datasets:
69
+ - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/OpenHermes-2.5-Spanish_fix_gpt.jsonl
70
+ type: sharegpt
71
+ conversation: chatml
72
+ field: conversations
73
+ roles:
74
+ input:
75
+ - system
76
+ - gpt
77
+ output:
78
+ - human
79
+ - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/OpenHermes-2.5-English.jsonl
80
+ type: sharegpt
81
+ conversation: chatml
82
+ field: conversations
83
+ - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/glaive-function-calling-v2.jsonl
84
+ type: sharegpt
85
+ conversation: chatml
86
+ field: conversations
87
+ roles:
88
+ input:
89
+ - system
90
+ - gpt
91
+ - tool
92
+ output:
93
+ - human
94
+ - path: /ikerlariak/igarcia945/Mortadelo-Filemon/final_dataset/glaive-code-assistant-v3-small.jsonl
95
+ type: sharegpt
96
+ conversation: chatml
97
+ field: conversations
98
+ roles:
99
+ input:
100
+ - system
101
+ - gpt
102
+ output:
103
+ - human
104
+ chat_template: chatml
105
+
106
+ dataset_prepared_path: /ikerlariak/igarcia945/Mortadelo-Filemon/gemma-2b-spanish/dataset
107
+
108
+ shuffle_merged_datasets: true
109
+
110
+ val_set_size: 0.005
111
+
112
+ output_dir: /ikerlariak/igarcia945/Mortadelo-Filemon/gemma-2b-spanish/
113
+
114
+ adapter:
115
+ lora_model_dir:
116
+
117
+ sequence_len: 8192
118
+ sample_packing: true
119
+ eval_sample_packing: false
120
+ pad_to_sequence_len: false
121
+
122
+ special_tokens:
123
+ bos_token: "<|im_start|>"
124
+ eos_token: "<|im_end|>"
125
+ pad_token: "<|end_of_text|>"
126
+
127
+ tokens:
128
+ - "<|begin_of_text|>"
129
+ - "<|end_of_text|>"
130
+ - "<|im_start|>"
131
+ - "<|im_end|>"
132
+ - "<|start_header_id|>"
133
+ - "<|end_header_id|>"
134
+ - "<tool_call>"
135
+ - "<tool_response>"
136
+ - "<tools>"
137
+ - "</tool_call>"
138
+ - "</tool_response>"
139
+ - "</tools>"
140
+ - "<reserved1>"
141
+ - "<reserved2>"
142
+ - "<reserved3>"
143
+ - "<reserved4>"
144
+
145
+
146
+
147
+ neftune_noise_alpha: 5
148
+
149
+ wandb_project: Mortadelo&Filemon
150
+ wandb_entity: igarciaf
151
+ wandb_watch:
152
+ wandb_name: gemma2b
153
+ wandb_log_model:
154
+
155
+ gradient_accumulation_steps: 32
156
+ micro_batch_size: 2
157
+ eval_batch_size: 2
158
+ num_epochs: 3
159
+ optimizer: adamw_torch_fused
160
+ lr_scheduler: cosine
161
+ learning_rate: 0.00007
162
+
163
+
164
+ train_on_inputs: false
165
+ group_by_length: false
166
+ bf16: true
167
+ fp16: false
168
+ tf32: false
169
+
170
+ gradient_checkpointing: true
171
+ early_stopping_patience:
172
+ resume_from_checkpoint:
173
+ local_rank:
174
+ logging_steps: 1
175
+ xformers_attention:
176
+ flash_attention: true
177
+
178
+ warmup_ratio: 0.03
179
+ evals_per_epoch: 4
180
+ eval_table_size:
181
+ save_strategy: "no"
182
+ debug:
183
+ deepspeed: /ikerlariak/igarcia945/Mortadelo-Filemon/train_configs/deepspeed_zero3.json
184
+ weight_decay: 0.0
185
+ fsdp:
186
+ fsdp_config:
187
+ special_tokens:
188
+
189
+ seed: 33
190
+ ```
191
+
192
+