Files changed (1) hide show
  1. README.md +86 -85
README.md CHANGED
@@ -18,90 +18,6 @@ pipeline_tag: text-generation
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
20
 
21
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
22
- <details><summary>See axolotl config</summary>
23
-
24
- axolotl version: `0.4.0`
25
- ```yaml
26
- base_model: MaziyarPanahi/Qwen1.5-8x7b
27
- model_type: Qwen2ForCausalLM
28
- tokenizer_type: Qwen2Tokenizer
29
-
30
- trust_remote_code: true
31
-
32
- hub_model_id: MaziyarPanahi/Qwen1.5-8x7b-v0.1
33
- hf_use_auth_token: true
34
-
35
- load_in_8bit: false
36
- load_in_4bit: true
37
- strict: false
38
-
39
-
40
- datasets:
41
- - path: Crystalcareai/MoD-150k
42
- type: sharegpt
43
-
44
-
45
- dataset_prepared_path:
46
- val_set_size: 0.05
47
- output_dir: ./Qwen1.5-8x7b-v0.1-lora-out
48
-
49
- model_config:
50
- output_router_logits: true
51
-
52
- adapter: qlora
53
- lora_model_dir:
54
- sequence_len: 2048
55
- sample_packing: true
56
- pad_to_sequence_len: true
57
-
58
-
59
- lora_r: 32
60
- lora_alpha: 16
61
- lora_dropout: 0.05
62
- lora_target_linear: true
63
- lora_fan_in_fan_out:
64
-
65
-
66
- gradient_accumulation_steps: 2
67
- micro_batch_size: 2
68
- num_epochs: 1
69
- optimizer: adamw_bnb_8bit
70
- lr_scheduler: cosine
71
- learning_rate: 0.0002
72
-
73
-
74
- train_on_inputs: false
75
- group_by_length: false
76
- bf16: auto
77
- fp16:
78
- tf32: false
79
-
80
-
81
- gradient_checkpointing: true
82
- early_stopping_patience:
83
- resume_from_checkpoint:
84
- local_rank:
85
- logging_steps: 1
86
- xformers_attention:
87
- flash_attention: true
88
-
89
-
90
- warmup_steps: 10
91
- evals_per_epoch: 4
92
- eval_table_size:
93
- eval_max_new_tokens: 128
94
- saves_per_epoch: 1
95
- debug:
96
- deepspeed:
97
- weight_decay: 0.0
98
- fsdp:
99
- fsdp_config:
100
- special_tokens:
101
- ```
102
-
103
- </details><br>
104
-
105
  # Qwen1.5-8x7b-v0.1
106
 
107
  This model is a fine-tuned version of [MaziyarPanahi/Qwen1.5-8x7b](https://huggingface.co/MaziyarPanahi/Qwen1.5-8x7b) on the [Crystalcareai/MoD-150k](https://huggingface.co/datasets/Crystalcareai/MoD-150k) dataset.
@@ -381,4 +297,89 @@ The following hyperparameters were used during training:
381
  - Transformers 4.39.0.dev0
382
  - Pytorch 2.2.0+cu121
383
  - Datasets 2.17.0
384
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  # Qwen1.5-8x7b-v0.1
22
 
23
  This model is a fine-tuned version of [MaziyarPanahi/Qwen1.5-8x7b](https://huggingface.co/MaziyarPanahi/Qwen1.5-8x7b) on the [Crystalcareai/MoD-150k](https://huggingface.co/datasets/Crystalcareai/MoD-150k) dataset.
 
297
  - Transformers 4.39.0.dev0
298
  - Pytorch 2.2.0+cu121
299
  - Datasets 2.17.0
300
+ - Tokenizers 0.15.0
301
+
302
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
303
+ <details><summary>See axolotl config</summary>
304
+
305
+ axolotl version: `0.4.0`
306
+ ```yaml
307
+ base_model: MaziyarPanahi/Qwen1.5-8x7b
308
+ model_type: Qwen2ForCausalLM
309
+ tokenizer_type: Qwen2Tokenizer
310
+
311
+ trust_remote_code: true
312
+
313
+ hub_model_id: MaziyarPanahi/Qwen1.5-8x7b-v0.1
314
+ hf_use_auth_token: true
315
+
316
+ load_in_8bit: false
317
+ load_in_4bit: true
318
+ strict: false
319
+
320
+
321
+ datasets:
322
+ - path: Crystalcareai/MoD-150k
323
+ type: sharegpt
324
+
325
+
326
+ dataset_prepared_path:
327
+ val_set_size: 0.05
328
+ output_dir: ./Qwen1.5-8x7b-v0.1-lora-out
329
+
330
+ model_config:
331
+ output_router_logits: true
332
+
333
+ adapter: qlora
334
+ lora_model_dir:
335
+ sequence_len: 2048
336
+ sample_packing: true
337
+ pad_to_sequence_len: true
338
+
339
+
340
+ lora_r: 32
341
+ lora_alpha: 16
342
+ lora_dropout: 0.05
343
+ lora_target_linear: true
344
+ lora_fan_in_fan_out:
345
+
346
+
347
+ gradient_accumulation_steps: 2
348
+ micro_batch_size: 2
349
+ num_epochs: 1
350
+ optimizer: adamw_bnb_8bit
351
+ lr_scheduler: cosine
352
+ learning_rate: 0.0002
353
+
354
+
355
+ train_on_inputs: false
356
+ group_by_length: false
357
+ bf16: auto
358
+ fp16:
359
+ tf32: false
360
+
361
+
362
+ gradient_checkpointing: true
363
+ early_stopping_patience:
364
+ resume_from_checkpoint:
365
+ local_rank:
366
+ logging_steps: 1
367
+ xformers_attention:
368
+ flash_attention: true
369
+
370
+
371
+ warmup_steps: 10
372
+ evals_per_epoch: 4
373
+ eval_table_size:
374
+ eval_max_new_tokens: 128
375
+ saves_per_epoch: 1
376
+ debug:
377
+ deepspeed:
378
+ weight_decay: 0.0
379
+ fsdp:
380
+ fsdp_config:
381
+ special_tokens:
382
+ ```
383
+
384
+ </details><br>
385
+ -