Sao10K commited on
Commit
854f7bc
·
verified ·
1 Parent(s): 9f6409f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -0
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ license_name: qwen
5
+ license_link: https://huggingface.co/Qwen/Qwen2.5-14B/blob/main/LICENSE
6
+ base_model: Qwen/Qwen2.5-14B
7
+ tags:
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: 14B-Qwen2.5-Freya-x1
11
+ results: []
12
+ ---
13
+
14
+ ![Kunou](https://huggingface.co/Sao10K/72B-Qwen2.5-Kunou-v1/resolve/main/knn.png)
15
+
16
+ **Sister Versions for Lightweight and Heavyweight Use!**
17
+
18
+ # 14B-Qwen2.5-Freya-v1
19
+
20
+ I decided to mess around with training methods, considering the re-emegence of no longer used methods like multi-step training. Some people began doing it again, and so, why not? Inspired by LimaRP's methology but done it my way.
21
+
22
+
23
+ Freya-S1
24
+ - LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model.
25
+ - Cleaned text and literature as best as I could, still, may have had issues here and there.
26
+
27
+ Freya-S2
28
+ - The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that.
29
+ - Reduced LoRA rank because it's mainly instruct and other details I won't get into.
30
+
31
+ Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.*
32
+ ```
33
+ Prompt Format: ChatML
34
+ Temperature: 1.1
35
+ min_p: 0.1
36
+ ```
37
+
38
+ Training time in total was ~10 Hours on a 8xH100 Node, sponsored by the Government of Singapore or something. Thanks for the national service allowance, MHA.
39
+
40
+ https://sao10k.carrd.co/ for contact.
41
+
42
+ ---
43
+
44
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
45
+ <details><summary>See axolotl config</summary>
46
+
47
+ axolotl version: `0.6.0`
48
+ ```yaml
49
+ base_model:
50
+ - s1: Qwen/Qwen2.5-14B
51
+ - s2: Qwen/Qwen2.5-14B-Instruct
52
+ model_type: AutoModelForCausalLM
53
+ tokenizer_type: AutoTokenizer
54
+
55
+ load_in_8bit: false
56
+ load_in_4bit: false
57
+ strict: false
58
+ sequence_len: 16384
59
+ bf16: auto
60
+ fp16:
61
+ tf32: false
62
+ flash_attention: true
63
+ special_tokens:
64
+
65
+ adapter: lora
66
+ lora_r:
67
+ - s1: 64
68
+ - s2: 32
69
+ lora_alpha: 64
70
+ lora_dropout: 0.2
71
+ lora_fan_in_fan_out:
72
+ peft_use_rslora: true
73
+ lora_target_linear: true
74
+
75
+ # Data
76
+ dataset_prepared_path: dataset_rUn_freya
77
+ datasets:
78
+ # S1 - Writing / Completion
79
+ - path: datasets/eBooks-cleaned-75K
80
+ type: completion
81
+ - path: datasets/novels-clean-dedupe-10K
82
+ type: completion
83
+ # S2 - Instruct
84
+ - path: datasets/10k-amoral-full-fixed-sys.json
85
+ type: chat_template
86
+ chat_template: chatml
87
+ roles_to_train: ["gpt"]
88
+ field_messages: conversations
89
+ message_field_role: from
90
+ message_field_content: value
91
+ train_on_eos: turn
92
+ - path: datasets/44k-hespera-smartshuffle.json
93
+ type: chat_template
94
+ chat_template: chatml
95
+ roles_to_train: ["gpt"]
96
+ field_messages: conversations
97
+ message_field_role: from
98
+ message_field_content: value
99
+ train_on_eos: turn
100
+ - path: datasets/5k_rpg_adventure_instruct-sys.json
101
+ type: chat_template
102
+ chat_template: chatml
103
+ roles_to_train: ["gpt"]
104
+ field_messages: conversations
105
+ message_field_role: from
106
+ message_field_content: value
107
+ train_on_eos: turn
108
+ shuffle_merged_datasets: true
109
+ warmup_ratio: 0.1
110
+
111
+ plugins:
112
+ - axolotl.integrations.liger.LigerPlugin
113
+ liger_rope: true
114
+ liger_rms_norm: true
115
+ liger_layer_norm: true
116
+ liger_glu_activation: true
117
+ liger_fused_linear_cross_entropy: true
118
+
119
+ # Iterations
120
+ num_epochs:
121
+ - s1: 2
122
+ - s2: 2
123
+
124
+ # Sampling
125
+ sample_packing: true
126
+ pad_to_sequence_len: true
127
+ train_on_inputs: false
128
+ group_by_length: false
129
+
130
+ # Batching
131
+ gradient_accumulation_steps: 4
132
+ micro_batch_size: 2
133
+ gradient_checkpointing: unsloth
134
+
135
+ # Evaluation
136
+ val_set_size: 0.025
137
+ evals_per_epoch: 5
138
+ eval_table_size:
139
+ eval_max_new_tokens: 256
140
+ eval_sample_packing: false
141
+ eval_batch_size: 1
142
+
143
+ # Optimizer
144
+ optimizer: paged_ademamix_8bit
145
+ lr_scheduler: cosine
146
+ learning_rate:
147
+ - s1: 0.000002
148
+ - s2: 0.000004
149
+ weight_decay: 0.2
150
+ max_grad_norm: 10.0
151
+
152
+ # Garbage Collection
153
+ gc_steps: 10
154
+
155
+ # Misc
156
+ deepspeed: ./deepspeed_configs/zero2.json
157
+
158
+ ```
159
+
160
+ </details><br>