NeuralNovel commited on
Commit
6cb6003
1 Parent(s): ba2ba11

Delete .ipynb_checkpoints/README-checkpoint.md

Browse files
.ipynb_checkpoints/README-checkpoint.md DELETED
@@ -1,169 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- library_name: peft
4
- tags:
5
- - generated_from_trainer
6
- base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4
7
- model-index:
8
- - name: qlora-out
9
- results: []
10
- ---
11
-
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
16
- <details><summary>See axolotl config</summary>
17
-
18
- axolotl version: `0.3.0`
19
- ```yaml
20
- base_model: alnrg2arg/blockchainlabs_7B_merged_test2_4
21
- model_type: MistralForCausalLM
22
- tokenizer_type: LlamaTokenizer
23
- is_mistral_derived_model: true
24
-
25
- load_in_8bit: false
26
- load_in_4bit: true
27
- strict: false
28
-
29
- datasets:
30
- - path: NeuralNovel/Neural-Story-v1
31
- type: completion
32
- dataset_prepared_path: last_run_prepared
33
- val_set_size: 0.1
34
- output_dir: ./qlora-out
35
-
36
- adapter: qlora
37
- lora_model_dir:
38
-
39
- sequence_len: 8192
40
- sample_packing: false
41
- pad_to_sequence_len: true
42
-
43
- lora_r: 32
44
- lora_alpha: 16
45
- lora_dropout: 0.05
46
- lora_target_linear: true
47
- lora_fan_in_fan_out:
48
- lora_target_modules:
49
- - gate_proj
50
- - down_proj
51
- - up_proj
52
- - q_proj
53
- - v_proj
54
- - k_proj
55
- - o_proj
56
-
57
- wandb_project:
58
- wandb_entity:
59
- wandb_watch:
60
- wandb_name:
61
- wandb_log_model:
62
-
63
- gradient_accumulation_steps: 4
64
- micro_batch_size: 2
65
- num_epochs: 1
66
- optimizer: adamw_bnb_8bit
67
- lr_scheduler: cosine
68
- learning_rate: 0.0002
69
-
70
- train_on_inputs: false
71
- group_by_length: false
72
- bf16: true
73
- fp16: false
74
- tf32: false
75
-
76
- gradient_checkpointing: true
77
- early_stopping_patience:
78
- resume_from_checkpoint:
79
- local_rank:
80
- logging_steps: 1
81
- xformers_attention:
82
- flash_attention: true
83
-
84
- loss_watchdog_threshold: 5.0
85
- loss_watchdog_patience: 3
86
-
87
- warmup_steps: 10
88
- evals_per_epoch: 4
89
- eval_table_size:
90
- eval_table_max_new_tokens: 128
91
- saves_per_epoch: 1
92
- debug:
93
- deepspeed:
94
- weight_decay: 0.0
95
- fsdp:
96
- fsdp_config:
97
- special_tokens:
98
- bos_token: "<s>"
99
- eos_token: "</s>"
100
- unk_token: "<unk>"
101
-
102
- ```
103
-
104
- </details><br>
105
-
106
- # qlora-out
107
-
108
- This model is a fine-tuned version of [alnrg2arg/blockchainlabs_7B_merged_test2_4](https://huggingface.co/alnrg2arg/blockchainlabs_7B_merged_test2_4) on the Neural-Story-v1.
109
- It achieves the following results on the evaluation set:
110
- - Loss: 2.1411
111
-
112
- ## Model description
113
-
114
- More information needed
115
-
116
- ## Intended uses & limitations
117
-
118
- More information needed
119
-
120
- ## Training and evaluation data
121
-
122
- More information needed
123
-
124
- ## Training procedure
125
-
126
-
127
- The following `bitsandbytes` quantization config was used during training:
128
- - quant_method: bitsandbytes
129
- - load_in_8bit: False
130
- - load_in_4bit: True
131
- - llm_int8_threshold: 6.0
132
- - llm_int8_skip_modules: None
133
- - llm_int8_enable_fp32_cpu_offload: False
134
- - llm_int8_has_fp16_weight: False
135
- - bnb_4bit_quant_type: nf4
136
- - bnb_4bit_use_double_quant: True
137
- - bnb_4bit_compute_dtype: bfloat16
138
-
139
- ### Training hyperparameters
140
-
141
- The following hyperparameters were used during training:
142
- - learning_rate: 0.0002
143
- - train_batch_size: 2
144
- - eval_batch_size: 2
145
- - seed: 42
146
- - gradient_accumulation_steps: 4
147
- - total_train_batch_size: 8
148
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
149
- - lr_scheduler_type: cosine
150
- - lr_scheduler_warmup_steps: 10
151
- - num_epochs: 1
152
-
153
- ### Training results
154
-
155
- | Training Loss | Epoch | Step | Validation Loss |
156
- |:-------------:|:-----:|:----:|:---------------:|
157
- | 2.3251 | 0.06 | 1 | 2.8409 |
158
- | 2.5318 | 0.25 | 4 | 2.7634 |
159
- | 1.7316 | 0.51 | 8 | 2.3662 |
160
- | 1.5196 | 0.76 | 12 | 2.1411 |
161
-
162
-
163
- ### Framework versions
164
-
165
- - PEFT 0.7.0
166
- - Transformers 4.37.0.dev0
167
- - Pytorch 2.0.1+cu117
168
- - Datasets 2.16.1
169
- - Tokenizers 0.15.0