dcbv
/

PEFT
rondlite commited on
Commit
9d46fa2
1 Parent(s): 4bb2fe0

update qlora

Browse files
README.md CHANGED
@@ -1,10 +1,342 @@
1
  ---
2
  library_name: peft
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ## Training procedure
5
 
6
 
7
  The following `bitsandbytes` quantization config was used during training:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - load_in_8bit: True
9
  - load_in_4bit: False
10
  - llm_int8_threshold: 6.0
@@ -15,7 +347,15 @@ The following `bitsandbytes` quantization config was used during training:
15
  - bnb_4bit_use_double_quant: False
16
  - bnb_4bit_compute_dtype: float32
17
 
 
 
 
 
 
 
 
18
  The following `bitsandbytes` quantization config was used during training:
 
19
  - load_in_8bit: True
20
  - load_in_4bit: False
21
  - llm_int8_threshold: 6.0
@@ -25,8 +365,84 @@ The following `bitsandbytes` quantization config was used during training:
25
  - bnb_4bit_quant_type: fp4
26
  - bnb_4bit_use_double_quant: False
27
  - bnb_4bit_compute_dtype: float32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ### Framework versions
29
 
30
- - PEFT 0.5.0.dev0
31
 
32
- - PEFT 0.5.0.dev0
 
1
  ---
2
  library_name: peft
3
+ base_model: models\LLaMA2-13B-Tiefighter
4
  ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+
201
+
202
+ ## Training procedure
203
+
204
+
205
+ The following `bitsandbytes` quantization config was used during training:
206
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
207
+ - load_in_8bit: False
208
+ - load_in_4bit: True
209
+ - llm_int8_threshold: 6.0
210
+ - llm_int8_skip_modules: None
211
+ - llm_int8_enable_fp32_cpu_offload: False
212
+ - llm_int8_has_fp16_weight: False
213
+ - bnb_4bit_quant_type: nf4
214
+ - bnb_4bit_use_double_quant: False
215
+ - bnb_4bit_compute_dtype: float16
216
+
217
+ ### Framework versions
218
+
219
+
220
+ - PEFT 0.6.2
221
+ ## Training procedure
222
+
223
+
224
+ The following `bitsandbytes` quantization config was used during training:
225
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
226
+ - load_in_8bit: False
227
+ - load_in_4bit: True
228
+ - llm_int8_threshold: 6.0
229
+ - llm_int8_skip_modules: None
230
+ - llm_int8_enable_fp32_cpu_offload: False
231
+ - llm_int8_has_fp16_weight: False
232
+ - bnb_4bit_quant_type: nf4
233
+ - bnb_4bit_use_double_quant: False
234
+ - bnb_4bit_compute_dtype: float16
235
+
236
+ ### Framework versions
237
+
238
+
239
+ - PEFT 0.6.2
240
+ ## Training procedure
241
+
242
+
243
+ The following `bitsandbytes` quantization config was used during training:
244
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
245
+ - load_in_8bit: False
246
+ - load_in_4bit: True
247
+ - llm_int8_threshold: 6.0
248
+ - llm_int8_skip_modules: None
249
+ - llm_int8_enable_fp32_cpu_offload: False
250
+ - llm_int8_has_fp16_weight: False
251
+ - bnb_4bit_quant_type: nf4
252
+ - bnb_4bit_use_double_quant: False
253
+ - bnb_4bit_compute_dtype: float16
254
+
255
+ ### Framework versions
256
+
257
+
258
+ - PEFT 0.6.2
259
+ ## Training procedure
260
+
261
+
262
+ The following `bitsandbytes` quantization config was used during training:
263
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
264
+ - load_in_8bit: False
265
+ - load_in_4bit: True
266
+ - llm_int8_threshold: 6.0
267
+ - llm_int8_skip_modules: None
268
+ - llm_int8_enable_fp32_cpu_offload: False
269
+ - llm_int8_has_fp16_weight: False
270
+ - bnb_4bit_quant_type: nf4
271
+ - bnb_4bit_use_double_quant: False
272
+ - bnb_4bit_compute_dtype: float16
273
+
274
+ ### Framework versions
275
+
276
+
277
+ - PEFT 0.6.2
278
  ## Training procedure
279
 
280
 
281
  The following `bitsandbytes` quantization config was used during training:
282
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
283
+ - load_in_8bit: False
284
+ - load_in_4bit: True
285
+ - llm_int8_threshold: 6.0
286
+ - llm_int8_skip_modules: None
287
+ - llm_int8_enable_fp32_cpu_offload: False
288
+ - llm_int8_has_fp16_weight: False
289
+ - bnb_4bit_quant_type: nf4
290
+ - bnb_4bit_use_double_quant: False
291
+ - bnb_4bit_compute_dtype: float16
292
+
293
+ ### Framework versions
294
+
295
+
296
+ - PEFT 0.6.2
297
+ ## Training procedure
298
+
299
+
300
+ The following `bitsandbytes` quantization config was used during training:
301
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
302
+ - load_in_8bit: False
303
+ - load_in_4bit: True
304
+ - llm_int8_threshold: 6.0
305
+ - llm_int8_skip_modules: None
306
+ - llm_int8_enable_fp32_cpu_offload: False
307
+ - llm_int8_has_fp16_weight: False
308
+ - bnb_4bit_quant_type: nf4
309
+ - bnb_4bit_use_double_quant: False
310
+ - bnb_4bit_compute_dtype: float16
311
+
312
+ ### Framework versions
313
+
314
+
315
+ - PEFT 0.6.2
316
+ ## Training procedure
317
+
318
+
319
+ The following `bitsandbytes` quantization config was used during training:
320
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
321
+ - load_in_8bit: True
322
+ - load_in_4bit: False
323
+ - llm_int8_threshold: 6.0
324
+ - llm_int8_skip_modules: None
325
+ - llm_int8_enable_fp32_cpu_offload: True
326
+ - llm_int8_has_fp16_weight: False
327
+ - bnb_4bit_quant_type: fp4
328
+ - bnb_4bit_use_double_quant: False
329
+ - bnb_4bit_compute_dtype: float32
330
+
331
+ ### Framework versions
332
+
333
+
334
+ - PEFT 0.6.2
335
+ ## Training procedure
336
+
337
+
338
+ The following `bitsandbytes` quantization config was used during training:
339
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
340
  - load_in_8bit: True
341
  - load_in_4bit: False
342
  - llm_int8_threshold: 6.0
 
347
  - bnb_4bit_use_double_quant: False
348
  - bnb_4bit_compute_dtype: float32
349
 
350
+ ### Framework versions
351
+
352
+
353
+ - PEFT 0.6.2
354
+ ## Training procedure
355
+
356
+
357
  The following `bitsandbytes` quantization config was used during training:
358
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
359
  - load_in_8bit: True
360
  - load_in_4bit: False
361
  - llm_int8_threshold: 6.0
 
365
  - bnb_4bit_quant_type: fp4
366
  - bnb_4bit_use_double_quant: False
367
  - bnb_4bit_compute_dtype: float32
368
+
369
+ ### Framework versions
370
+
371
+
372
+ - PEFT 0.6.2
373
+ ## Training procedure
374
+
375
+
376
+ The following `bitsandbytes` quantization config was used during training:
377
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
378
+ - load_in_8bit: False
379
+ - load_in_4bit: True
380
+ - llm_int8_threshold: 6.0
381
+ - llm_int8_skip_modules: None
382
+ - llm_int8_enable_fp32_cpu_offload: False
383
+ - llm_int8_has_fp16_weight: False
384
+ - bnb_4bit_quant_type: nf4
385
+ - bnb_4bit_use_double_quant: False
386
+ - bnb_4bit_compute_dtype: float16
387
+
388
+ ### Framework versions
389
+
390
+
391
+ - PEFT 0.6.2
392
+ ## Training procedure
393
+
394
+
395
+ The following `bitsandbytes` quantization config was used during training:
396
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
397
+ - load_in_8bit: False
398
+ - load_in_4bit: True
399
+ - llm_int8_threshold: 6.0
400
+ - llm_int8_skip_modules: None
401
+ - llm_int8_enable_fp32_cpu_offload: False
402
+ - llm_int8_has_fp16_weight: False
403
+ - bnb_4bit_quant_type: nf4
404
+ - bnb_4bit_use_double_quant: False
405
+ - bnb_4bit_compute_dtype: float16
406
+
407
+ ### Framework versions
408
+
409
+
410
+ - PEFT 0.6.2
411
+ ## Training procedure
412
+
413
+
414
+ The following `bitsandbytes` quantization config was used during training:
415
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
416
+ - load_in_8bit: False
417
+ - load_in_4bit: True
418
+ - llm_int8_threshold: 6.0
419
+ - llm_int8_skip_modules: None
420
+ - llm_int8_enable_fp32_cpu_offload: False
421
+ - llm_int8_has_fp16_weight: False
422
+ - bnb_4bit_quant_type: nf4
423
+ - bnb_4bit_use_double_quant: False
424
+ - bnb_4bit_compute_dtype: float16
425
+
426
+ ### Framework versions
427
+
428
+
429
+ - PEFT 0.6.2
430
+ ## Training procedure
431
+
432
+
433
+ The following `bitsandbytes` quantization config was used during training:
434
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
435
+ - load_in_8bit: False
436
+ - load_in_4bit: True
437
+ - llm_int8_threshold: 6.0
438
+ - llm_int8_skip_modules: None
439
+ - llm_int8_enable_fp32_cpu_offload: False
440
+ - llm_int8_has_fp16_weight: False
441
+ - bnb_4bit_quant_type: nf4
442
+ - bnb_4bit_use_double_quant: False
443
+ - bnb_4bit_compute_dtype: float16
444
+
445
  ### Framework versions
446
 
 
447
 
448
+ - PEFT 0.6.2
adapter_config.json CHANGED
@@ -1,6 +1,7 @@
1
  {
 
2
  "auto_mapping": null,
3
- "base_model_name_or_path": "models/mythalion-13b",
4
  "bias": "none",
5
  "fan_in_fan_out": false,
6
  "inference_mode": true,
@@ -12,10 +13,16 @@
12
  "modules_to_save": null,
13
  "peft_type": "LORA",
14
  "r": 128,
 
15
  "revision": null,
16
  "target_modules": [
 
 
 
17
  "q_proj",
18
- "v_proj"
 
 
19
  ],
20
  "task_type": "CAUSAL_LM"
21
  }
 
1
  {
2
+ "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "models\\LLaMA2-13B-Tiefighter",
5
  "bias": "none",
6
  "fan_in_fan_out": false,
7
  "inference_mode": true,
 
13
  "modules_to_save": null,
14
  "peft_type": "LORA",
15
  "r": 128,
16
+ "rank_pattern": {},
17
  "revision": null,
18
  "target_modules": [
19
+ "gate_proj",
20
+ "o_proj",
21
+ "down_proj",
22
  "q_proj",
23
+ "up_proj",
24
+ "v_proj",
25
+ "k_proj"
26
  ],
27
  "task_type": "CAUSAL_LM"
28
  }
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9e9252b498460da1e610f4727d892cf5f800e1480964b3df994baabcdf266679
3
- size 419488077
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a28bb35807a1e0701ca619f722aba6f153335a231ca8fb59ee8c3af441f4bf98
3
+ size 2002982666
training_log.json DELETED
@@ -1,16 +0,0 @@
1
- {
2
- "base_model_name": "mythalion-13b",
3
- "base_model_class": "LlamaForCausalLM",
4
- "base_loaded_in_4bit": false,
5
- "base_loaded_in_8bit": true,
6
- "projections": "q, v",
7
- "loss": 0.9646,
8
- "learning_rate": 0.00015,
9
- "epoch": 1.88,
10
- "current_steps": 46,
11
- "train_runtime": 87.3735,
12
- "train_samples_per_second": 2.232,
13
- "train_steps_per_second": 0.034,
14
- "total_flos": 2487721328640000.0,
15
- "train_loss": 0.9646244049072266
16
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training_parameters.json CHANGED
@@ -1,30 +1,37 @@
1
  {
2
  "lora_name": "charluv-lora",
3
- "always_override": false,
4
- "save_steps": 0.0,
5
  "micro_batch_size": 4,
6
- "batch_size": 128,
7
- "epochs": 3.0,
8
  "learning_rate": "3e-4",
9
- "lr_scheduler_type": "cosine_with_restarts",
10
  "lora_rank": 128,
11
  "lora_alpha": 256,
12
  "lora_dropout": 0.05,
13
  "cutoff_len": 256,
14
- "dataset": "None",
15
  "eval_dataset": "None",
16
- "format": "None",
17
  "eval_steps": 100.0,
18
- "raw_text_file": "aivo",
19
- "overlap_len": 128,
20
- "newline_favor_len": 128,
21
  "higher_rank_limit": false,
22
  "warmup_steps": 100.0,
23
  "optimizer": "adamw_torch",
24
  "hard_cut_string": "\\n\\n\\n",
25
  "train_only_after": "",
26
- "stop_at_loss": 1.5,
27
  "add_eos_token": false,
28
  "min_chars": 0.0,
29
- "report_to": "wandb"
 
 
 
 
 
 
 
 
 
30
  }
 
1
  {
2
  "lora_name": "charluv-lora",
3
+ "always_override": true,
4
+ "save_steps": 1000.0,
5
  "micro_batch_size": 4,
6
+ "batch_size": 0,
7
+ "epochs": 1.0,
8
  "learning_rate": "3e-4",
9
+ "lr_scheduler_type": "linear",
10
  "lora_rank": 128,
11
  "lora_alpha": 256,
12
  "lora_dropout": 0.05,
13
  "cutoff_len": 256,
14
+ "dataset": "training",
15
  "eval_dataset": "None",
16
+ "format": "alpaca-format",
17
  "eval_steps": 100.0,
18
+ "raw_text_file": "None",
 
 
19
  "higher_rank_limit": false,
20
  "warmup_steps": 100.0,
21
  "optimizer": "adamw_torch",
22
  "hard_cut_string": "\\n\\n\\n",
23
  "train_only_after": "",
24
+ "stop_at_loss": 0.1,
25
  "add_eos_token": false,
26
  "min_chars": 0.0,
27
+ "report_to": "None",
28
+ "precize_slicing_overlap": true,
29
+ "add_eos_token_type": "Every Block",
30
+ "save_steps_under_loss": 1.8,
31
+ "add_bos_token": true,
32
+ "training_projection": "all",
33
+ "sliding_window": false,
34
+ "warmup_ratio": 0,
35
+ "grad_accumulation": 1,
36
+ "neft_noise_alpha": 0
37
  }
training_prompt.json CHANGED
@@ -1,3 +1,5 @@
1
  {
2
- "template_type": "raw_text"
 
 
3
  }
 
1
  {
2
+ "template_type": "dataset",
3
+ "template_1": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n%instruction%\n\n### Response:\n%output%",
4
+ "template_2": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n%instruction%\n\n### Input:\n%input%\n\n### Response:\n%output%"
5
  }