koesn commited on
Commit
5f0fbd5
1 Parent(s): 02e0e26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +431 -3
README.md CHANGED
@@ -1,3 +1,431 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ base_model: meta-llama/Meta-Llama-3.1-8B
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - cognitivecomputations/Dolphin-2.9
8
+ - m-a-p/CodeFeedback-Filtered-Instruction
9
+ - cognitivecomputations/dolphin-coder
10
+ - cognitivecomputations/samantha-data
11
+ - microsoft/orca-math-word-problems-200k
12
+ - mlabonne/FineTome-100k
13
+ - arcee/agent_data
14
+ - PawanKrd/math-gpt-4o-200k
15
+ - cognitivecomputations/SystemChat-2.0
16
+ ---
17
+
18
+
19
+ ## Description
20
+ This repo contains GGUF format model files for dolphin-2.9.4-llama3.1-8b.
21
+
22
+ ## Files Provided
23
+ | Name | Quant | Bits | File Size | Remark |
24
+ | ----------------------------------- | ----- | ---- | --------- | -------------------------------- |
25
+ | dolphin-2.9.4-llama3.1-8b.Q2_K.gguf | Q2_K | 2 | 3.18 GB | 2.96G, +3.5199 ppl @ Llama-3-8B |
26
+ | dolphin-2.9.4-llama3.1-8b.Q3_K.gguf | Q3_K | 3 | 4.02 GB | 3.74G, +0.6569 ppl @ Llama-3-8B |
27
+ | dolphin-2.9.4-llama3.1-8b.Q4_0.gguf | Q4_0 | 4 | 4.66 GB | 4.34G, +0.4685 ppl @ Llama-3-8B |
28
+ | dolphin-2.9.4-llama3.1-8b.Q4_K.gguf | Q4_K | 4 | 4.92 GB | 4.58G, +0.1754 ppl @ Llama-3-8B |
29
+ | dolphin-2.9.4-llama3.1-8b.Q5_K.gguf | Q5_K | 5 | 5.73 GB | 5.33G, +0.0569 ppl @ Llama-3-8B |
30
+ | dolphin-2.9.4-llama3.1-8b.Q6_K.gguf | Q6_K | 6 | 6.60 GB | 6.14G, +0.0217 ppl @ Llama-3-8B |
31
+ | dolphin-2.9.4-llama3.1-8b.Q8_0.gguf | Q8_0 | 8 | 8.54 GB | 7.96G, +0.0026 ppl @ Llama-3-8B |
32
+
33
+ ## Parameters
34
+ | path | type | architecture | rope_theta | sliding_win | max_pos_embed |
35
+ | ----------------------------------------------- | ----- | ---------------- | ---------- | ----------- | ------------- |
36
+ | cognitivecomputations/dolphin-2.9.4-llama3.1-8b | llama | LlamaForCausalLM | 500000.0 | null | 131072 |
37
+
38
+
39
+ # Original Model Card
40
+
41
+
42
+ # Dolphin 2.9.4 Llama 3.1 8b 🐬
43
+
44
+ Curated and trained by Eric Hartford and Cognitive Computations
45
+
46
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/h3K4XGj2RH)
47
+ Discord: https://discord.gg/h3K4XGj2RH
48
+
49
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
50
+
51
+ Our appreciation for the sponsors of Dolphin 2.9.4:
52
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xL40S node
53
+
54
+ This model is based on Meta Llama 3.1 8b, and is governed by the Llama 3.1 license.
55
+
56
+ The base model has 128K context, and our finetuning used 8192 sequence length.
57
+
58
+ Dolphin 2.9.4 uses ChatML prompt template format.
59
+
60
+ example:
61
+
62
+ ```
63
+ <|im_start|>system
64
+ You are Dolphin, a helpful AI assistant.<|im_end|>
65
+ <|im_start|>user
66
+ {prompt}<|im_end|>
67
+ <|im_start|>assistant
68
+
69
+ ```
70
+
71
+ Dolphin-2.9.4 has a variety of instruction following, conversational, and coding skills. It also has agentic abilities and supports function calling.
72
+ It is especially trained to obey the system prompt, and follow instructions in many languages.
73
+
74
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
75
+
76
+
77
+ <details><summary>Evals</summary>
78
+
79
+ ```
80
+ hf (pretrained=/workspace/axolotl/dolphin-2.9.4-llama3.1-8b-hf,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (4)
81
+ | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
82
+ |-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
83
+ |leaderboard |N/A |none | 0|acc |↑ |0.2926|± |0.0041|
84
+ | | |none | 0|acc_norm |↑ |0.4513|± |0.0053|
85
+ | | |none | 0|exact_match |↑ |0.0982|± |0.0079|
86
+ | | |none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A |
87
+ | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A |
88
+ | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184|
89
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178|
90
+ | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.4931|± |0.0061|
91
+ | - leaderboard_bbh_boolean_expressions | 0|none | 3|acc_norm |↑ |0.8000|± |0.0253|
92
+ | - leaderboard_bbh_causal_judgement | 0|none | 3|acc_norm |↑ |0.5615|± |0.0364|
93
+ | - leaderboard_bbh_date_understanding | 0|none | 3|acc_norm |↑ |0.4520|± |0.0315|
94
+ | - leaderboard_bbh_disambiguation_qa | 0|none | 3|acc_norm |↑ |0.6640|± |0.0299|
95
+ | - leaderboard_bbh_formal_fallacies | 0|none | 3|acc_norm |↑ |0.5600|± |0.0315|
96
+ | - leaderboard_bbh_geometric_shapes | 0|none | 3|acc_norm |↑ |0.3640|± |0.0305|
97
+ | - leaderboard_bbh_hyperbaton | 0|none | 3|acc_norm |↑ |0.6320|± |0.0306|
98
+ | - leaderboard_bbh_logical_deduction_five_objects | 0|none | 3|acc_norm |↑ |0.4600|± |0.0316|
99
+ | - leaderboard_bbh_logical_deduction_seven_objects | 0|none | 3|acc_norm |↑ |0.4360|± |0.0314|
100
+ | - leaderboard_bbh_logical_deduction_three_objects | 0|none | 3|acc_norm |↑ |0.6160|± |0.0308|
101
+ | - leaderboard_bbh_movie_recommendation | 0|none | 3|acc_norm |↑ |0.7880|± |0.0259|
102
+ | - leaderboard_bbh_navigate | 0|none | 3|acc_norm |↑ |0.5200|± |0.0317|
103
+ | - leaderboard_bbh_object_counting | 0|none | 3|acc_norm |↑ |0.4520|± |0.0315|
104
+ | - leaderboard_bbh_penguins_in_a_table | 0|none | 3|acc_norm |↑ |0.5205|± |0.0415|
105
+ | - leaderboard_bbh_reasoning_about_colored_objects | 0|none | 3|acc_norm |↑ |0.5120|± |0.0317|
106
+ | - leaderboard_bbh_ruin_names | 0|none | 3|acc_norm |↑ |0.6320|± |0.0306|
107
+ | - leaderboard_bbh_salient_translation_error_detection | 0|none | 3|acc_norm |↑ |0.4320|± |0.0314|
108
+ | - leaderboard_bbh_snarks | 0|none | 3|acc_norm |↑ |0.5843|± |0.0370|
109
+ | - leaderboard_bbh_sports_understanding | 0|none | 3|acc_norm |↑ |0.7040|± |0.0289|
110
+ | - leaderboard_bbh_temporal_sequences | 0|none | 3|acc_norm |↑ |0.1440|± |0.0222|
111
+ | - leaderboard_bbh_tracking_shuffled_objects_five_objects | 0|none | 3|acc_norm |↑ |0.1560|± |0.0230|
112
+ | - leaderboard_bbh_tracking_shuffled_objects_seven_objects| 0|none | 3|acc_norm |↑ |0.1320|± |0.0215|
113
+ | - leaderboard_bbh_tracking_shuffled_objects_three_objects| 0|none | 3|acc_norm |↑ |0.2840|± |0.0286|
114
+ | - leaderboard_bbh_web_of_lies | 0|none | 3|acc_norm |↑ |0.4840|± |0.0317|
115
+ | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.2903|± |0.0132|
116
+ | - leaderboard_gpqa_diamond | 1|none | 0|acc_norm |↑ |0.2980|± |0.0326|
117
+ | - leaderboard_gpqa_extended | 1|none | 0|acc_norm |↑ |0.2839|± |0.0193|
118
+ | - leaderboard_gpqa_main | 1|none | 0|acc_norm |↑ |0.2946|± |0.0216|
119
+ | - leaderboard_ifeval | 2|none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A |
120
+ | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A |
121
+ | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184|
122
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178|
123
+ | - leaderboard_math_algebra_hard | 1|none | 4|exact_match |↑ |0.1596|± |0.0209|
124
+ | - leaderboard_math_counting_and_prob_hard | 1|none | 4|exact_match |↑ |0.0488|± |0.0195|
125
+ | - leaderboard_math_geometry_hard | 1|none | 4|exact_match |↑ |0.0530|± |0.0196|
126
+ | - leaderboard_math_hard |N/A |none | 4|exact_match |↑ |0.0982|± |0.0079|
127
+ | - leaderboard_math_intermediate_algebra_hard | 1|none | 4|exact_match |↑ |0.0143|± |0.0071|
128
+ | - leaderboard_math_num_theory_hard | 1|none | 4|exact_match |↑ |0.0455|± |0.0168|
129
+ | - leaderboard_math_prealgebra_hard | 1|none | 4|exact_match |↑ |0.2591|± |0.0316|
130
+ | - leaderboard_math_precalculus_hard | 1|none | 4|exact_match |↑ |0.0519|± |0.0192|
131
+ | - leaderboard_mmlu_pro | 0.1|none | 5|acc |↑ |0.2926|± |0.0041|
132
+ | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.3862|± |0.0173|
133
+ | - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm |↑ |0.5280|± |0.0316|
134
+ | - leaderboard_musr_object_placements | 1|none | 0|acc_norm |↑ |0.3594|± |0.0300|
135
+ | - leaderboard_musr_team_allocation | 1|none | 0|acc_norm |↑ |0.2720|± |0.0282|
136
+
137
+ | Groups |Version|Filter|n-shot| Metric | |Value | |Stderr|
138
+ |------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
139
+ |leaderboard |N/A |none | 0|acc |↑ |0.2926|± |0.0041|
140
+ | | |none | 0|acc_norm |↑ |0.4513|± |0.0053|
141
+ | | |none | 0|exact_match |↑ |0.0982|± |0.0079|
142
+ | | |none | 0|inst_level_loose_acc |↑ |0.3825|± |N/A |
143
+ | | |none | 0|inst_level_strict_acc |↑ |0.3597|± |N/A |
144
+ | | |none | 0|prompt_level_loose_acc |↑ |0.2421|± |0.0184|
145
+ | | |none | 0|prompt_level_strict_acc|↑ |0.2181|± |0.0178|
146
+ | - leaderboard_bbh |N/A |none | 3|acc_norm |↑ |0.4931|± |0.0061|
147
+ | - leaderboard_gpqa |N/A |none | 0|acc_norm |↑ |0.2903|± |0.0132|
148
+ | - leaderboard_math_hard|N/A |none | 4|exact_match |↑ |0.0982|± |0.0079|
149
+ | - leaderboard_musr |N/A |none | 0|acc_norm |↑ |0.3862|± |0.0173|
150
+ ```
151
+
152
+ </details>
153
+
154
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
155
+ <details><summary>See axolotl config</summary>
156
+
157
+ axolotl version: `0.4.1`
158
+ ```yaml
159
+ base_model: meta-llama/Meta-Llama-3.1-8B
160
+ model_type: LlamaForCausalLM
161
+ tokenizer_type: AutoTokenizer
162
+
163
+ load_in_8bit: false
164
+ # load_in_4bit: true
165
+ strict: false
166
+
167
+ datasets:
168
+ - path: /workspace/datasets/dolphin-2.9.4/dolphin201-sharegpt2.jsonl
169
+ type: sharegpt
170
+ conversation: chatml
171
+
172
+ chat_template: chatml
173
+ # adapter: qlora
174
+ # lora_r: 128
175
+ # lora_alpha: 16
176
+ # lora_modules_to_save: [embed_tokens, lm_head]
177
+ # lora_dropout: 0.05
178
+ # lora_target_linear: true
179
+
180
+ unfrozen_parameters:
181
+ - input_layernorm
182
+ - model.norm
183
+ - post_attention_layernorm
184
+ - self_attn.rotary_emb
185
+ - ^lm_head.weight$
186
+ - ^model.embed_tokens.weight$
187
+ # mlp.down_proj layers
188
+ - model.layers.1.mlp.down_proj
189
+ - model.layers.0.mlp.down_proj
190
+ - model.layers.30.mlp.down_proj
191
+ - model.layers.2.mlp.down_proj
192
+ - model.layers.21.mlp.down_proj
193
+ - model.layers.22.mlp.down_proj
194
+ - model.layers.29.mlp.down_proj
195
+ - model.layers.5.mlp.down_proj
196
+ - model.layers.4.mlp.down_proj
197
+ - model.layers.20.mlp.down_proj
198
+ - model.layers.23.mlp.down_proj
199
+ - model.layers.19.mlp.down_proj
200
+ - model.layers.3.mlp.down_proj
201
+ - model.layers.17.mlp.down_proj
202
+ - model.layers.6.mlp.down_proj
203
+ - model.layers.31.mlp.down_proj
204
+ # mlp.up_proj layers
205
+ - model.layers.4.mlp.up_proj
206
+ - model.layers.3.mlp.up_proj
207
+ - model.layers.0.mlp.up_proj
208
+ - model.layers.5.mlp.up_proj
209
+ - model.layers.7.mlp.up_proj
210
+ - model.layers.6.mlp.up_proj
211
+ - model.layers.2.mlp.up_proj
212
+ - model.layers.1.mlp.up_proj
213
+ - model.layers.8.mlp.up_proj
214
+ - model.layers.12.mlp.up_proj
215
+ - model.layers.14.mlp.up_proj
216
+ - model.layers.9.mlp.up_proj
217
+ - model.layers.15.mlp.up_proj
218
+ - model.layers.17.mlp.up_proj
219
+ - model.layers.13.mlp.up_proj
220
+ - model.layers.19.mlp.up_proj
221
+ # self_attn.k_proj layers
222
+ - model.layers.29.self_attn.k_proj
223
+ - model.layers.25.self_attn.k_proj
224
+ - model.layers.23.self_attn.k_proj
225
+ - model.layers.28.self_attn.k_proj
226
+ - model.layers.21.self_attn.k_proj
227
+ - model.layers.19.self_attn.k_proj
228
+ - model.layers.22.self_attn.k_proj
229
+ - model.layers.20.self_attn.k_proj
230
+ - model.layers.24.self_attn.k_proj
231
+ - model.layers.31.self_attn.k_proj
232
+ - model.layers.27.self_attn.k_proj
233
+ - model.layers.26.self_attn.k_proj
234
+ - model.layers.17.self_attn.k_proj
235
+ - model.layers.11.self_attn.k_proj
236
+ - model.layers.18.self_attn.k_proj
237
+ - model.layers.14.self_attn.k_proj
238
+ # self_attn.o_proj layers
239
+ - model.layers.14.self_attn.o_proj
240
+ - model.layers.7.self_attn.o_proj
241
+ - model.layers.5.self_attn.o_proj
242
+ - model.layers.11.self_attn.o_proj
243
+ - model.layers.6.self_attn.o_proj
244
+ - model.layers.24.self_attn.o_proj
245
+ - model.layers.9.self_attn.o_proj
246
+ - model.layers.13.self_attn.o_proj
247
+ - model.layers.10.self_attn.o_proj
248
+ - model.layers.12.self_attn.o_proj
249
+ - model.layers.8.self_attn.o_proj
250
+ - model.layers.25.self_attn.o_proj
251
+ - model.layers.21.self_attn.o_proj
252
+ - model.layers.23.self_attn.o_proj
253
+ - model.layers.15.self_attn.o_proj
254
+ - model.layers.16.self_attn.o_proj
255
+ # self_attn.q_proj layers
256
+ - model.layers.8.self_attn.q_proj
257
+ - model.layers.13.self_attn.q_proj
258
+ - model.layers.9.self_attn.q_proj
259
+ - model.layers.14.self_attn.q_proj
260
+ - model.layers.10.self_attn.q_proj
261
+ - model.layers.11.self_attn.q_proj
262
+ - model.layers.0.self_attn.q_proj
263
+ - model.layers.15.self_attn.q_proj
264
+ - model.layers.1.self_attn.q_proj
265
+ - model.layers.6.self_attn.q_proj
266
+ - model.layers.5.self_attn.q_proj
267
+ - model.layers.7.self_attn.q_proj
268
+ - model.layers.12.self_attn.q_proj
269
+ - model.layers.16.self_attn.q_proj
270
+ - model.layers.17.self_attn.q_proj
271
+ - model.layers.26.self_attn.q_proj
272
+ # self_attn.v_proj layers
273
+ - model.layers.26.self_attn.v_proj
274
+ - model.layers.17.self_attn.v_proj
275
+ - model.layers.3.self_attn.v_proj
276
+ - model.layers.28.self_attn.v_proj
277
+ - model.layers.29.self_attn.v_proj
278
+ - model.layers.21.self_attn.v_proj
279
+ - model.layers.15.self_attn.v_proj
280
+ - model.layers.16.self_attn.v_proj
281
+ - model.layers.20.self_attn.v_proj
282
+ - model.layers.25.self_attn.v_proj
283
+ - model.layers.6.self_attn.v_proj
284
+ - model.layers.23.self_attn.v_proj
285
+ - model.layers.4.self_attn.v_proj
286
+ - model.layers.1.self_attn.v_proj
287
+ - model.layers.22.self_attn.v_proj
288
+ - model.layers.14.self_attn.v_proj
289
+ # mlp.gate_proj layers
290
+ - model.layers.1.mlp.gate_proj
291
+ - model.layers.2.mlp.gate_proj
292
+ - model.layers.3.mlp.gate_proj
293
+ - model.layers.4.mlp.gate_proj
294
+ - model.layers.0.mlp.gate_proj
295
+ - model.layers.25.mlp.gate_proj
296
+ - model.layers.26.mlp.gate_proj
297
+ - model.layers.5.mlp.gate_proj
298
+ - model.layers.24.mlp.gate_proj
299
+ - model.layers.28.mlp.gate_proj
300
+ - model.layers.23.mlp.gate_proj
301
+ - model.layers.27.mlp.gate_proj
302
+ - model.layers.21.mlp.gate_proj
303
+ - model.layers.22.mlp.gate_proj
304
+ - model.layers.29.mlp.gate_proj
305
+ - model.layers.20.mlp.gate_proj
306
+
307
+
308
+
309
+
310
+ dataset_prepared_path: /workspace/axolotl/dolph-2.9.4-nemo-prepared
311
+ val_set_size: 0.01
312
+ output_dir: /workspace/axolotl/dolphin-2.9.4-llama3.1-8b
313
+
314
+ sequence_len: 8192
315
+ sample_packing: true
316
+ pad_to_sequence_len: true
317
+
318
+ wandb_project: dolphin-2.9.4-llama3.1-8b
319
+ wandb_watch:
320
+ wandb_run_id:
321
+ wandb_log_model:
322
+
323
+ gradient_accumulation_steps: 16
324
+ micro_batch_size: 2
325
+ num_epochs: 3
326
+ optimizer: adamw_torch
327
+ lr_scheduler: cosine
328
+ learning_rate: 5e-6
329
+ train_on_inputs: false
330
+ group_by_length: false
331
+ bf16: auto
332
+ fp16:
333
+ tf32:
334
+
335
+ gradient_checkpointing: true
336
+ gradient_checkpointing_kwargs:
337
+ use_reentrant: false
338
+ early_stopping_patience:
339
+ resume_from_checkpoint:
340
+ logging_steps: 1
341
+ xformers_attention:
342
+ flash_attention: true
343
+
344
+ warmup_steps: 100
345
+ # evals_per_epoch: 4
346
+ eval_table_size:
347
+ saves_per_epoch: 1
348
+ save_total_limit: 2
349
+ save_steps:
350
+ debug:
351
+ deepspeed: deepspeed_configs/zero3_bf16.json
352
+ weight_decay: 0.1
353
+ special_tokens:
354
+ eos_token: "<|im_end|>"
355
+ bos_token: "<|begin_of_text|>"
356
+ pad_token: "<|finetune_right_pad_id|>"
357
+ tokens:
358
+ - "<|im_start|>"
359
+
360
+
361
+ # fsdp:
362
+ # - full_shard
363
+ # - auto_wrap
364
+ # fsdp_config:
365
+ # fsdp_limit_all_gathers: true
366
+ # fsdp_sync_module_states: true
367
+ # fsdp_offload_params: true
368
+ # fsdp_use_orig_params: false
369
+ # fsdp_cpu_ram_efficient_loading: true
370
+ # fsdp_transformer_layer_cls_to_wrap: MixtralSparseMoeBlock
371
+ # fsdp_state_dict_type: FULL_STATE_DICT
372
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
373
+ # fsdp_sharding_strategy: FULL_SHARD
374
+ # fsdp_forward_prefetch: false
375
+ # fsdp_backward_prefetch: BACKWARD_PRE
376
+ ```
377
+
378
+ </details><br>
379
+
380
+ # workspace/axolotl/dolphin-2.9.4-llama3.1-8b
381
+
382
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the None dataset.
383
+ It achieves the following results on the evaluation set:
384
+ - Loss: 0.5655
385
+
386
+ ## Model description
387
+
388
+ More information needed
389
+
390
+ ## Intended uses & limitations
391
+
392
+ More information needed
393
+
394
+ ## Training and evaluation data
395
+
396
+ More information needed
397
+
398
+ ## Training procedure
399
+
400
+ ### Training hyperparameters
401
+
402
+ The following hyperparameters were used during training:
403
+ - learning_rate: 5e-06
404
+ - train_batch_size: 2
405
+ - eval_batch_size: 2
406
+ - seed: 42
407
+ - distributed_type: multi-GPU
408
+ - num_devices: 8
409
+ - gradient_accumulation_steps: 16
410
+ - total_train_batch_size: 256
411
+ - total_eval_batch_size: 16
412
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
413
+ - lr_scheduler_type: cosine
414
+ - lr_scheduler_warmup_steps: 100
415
+ - num_epochs: 3
416
+
417
+ ### Training results
418
+
419
+ | Training Loss | Epoch | Step | Validation Loss |
420
+ |:-------------:|:------:|:----:|:---------------:|
421
+ | 0.5837 | 1.0180 | 1161 | 0.5814 |
422
+ | 0.5525 | 2.0179 | 2322 | 0.5671 |
423
+ | 0.5514 | 2.9624 | 3420 | 0.5655 |
424
+
425
+
426
+ ### Framework versions
427
+
428
+ - Transformers 4.44.0.dev0
429
+ - Pytorch 2.4.0+cu121
430
+ - Datasets 2.19.1
431
+ - Tokenizers 0.19.1