alnrg2arg commited on
Commit
a892e9a
1 Parent(s): 9c66e58

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -45
README.md CHANGED
@@ -17,6 +17,60 @@ datasets:
17
 
18
  - **Finetuned from model :** alnrg2arg/blockchainlabs_7B_merged_test2_4
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  Benchmark scores
21
 
22
  | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
@@ -49,51 +103,6 @@ Benchmark scores
49
  |-----|------:|----------|-----:|-----------|-----:|---|-----:|
50
  |gsm8k| 2|get-answer| 5|exact_match|0.7468|± | 0.012|
51
 
52
- | Tasks |Version|Filter|n-shot| Metric | Value | |Stderr|
53
- |-----------------|-------|------|-----:|-----------|------:|---|-----:|
54
- |truthfulqa |N/A |none | 0|bleu_max |16.3339|± |0.3451|
55
- | | |none | 0|bleu_acc | 0.4982|± |0.0003|
56
- | | |none | 0|bleu_diff | 1.2909|± |0.1919|
57
- | | |none | 0|rouge1_max |41.6927|± |0.5469|
58
- | | |none | 0|rouge1_acc | 0.5300|± |0.0003|
59
- | | |none | 0|rouge1_diff| 1.4267|± |0.3796|
60
- | | |none | 0|rouge2_max |27.3013|± |0.6213|
61
- | | |none | 0|rouge2_acc | 0.4272|± |0.0003|
62
- | | |none | 0|rouge2_diff| 1.5314|± |0.4765|
63
- | | |none | 0|rougeL_max |37.8174|± |0.5443|
64
- | | |none | 0|rougeL_acc | 0.4859|± |0.0003|
65
- | | |none | 0|rougeL_diff| 1.2621|± |0.3898|
66
- | | |none | 0|acc | 0.6613|± |0.0435|
67
- | - truthfulqa_gen| 3|none | 0|bleu_max |16.3339|± |0.5874|
68
- | | |none | 0|bleu_acc | 0.4982|± |0.0175|
69
- | | |none | 0|bleu_diff | 1.2909|± |0.4381|
70
- | | |none | 0|rouge1_max |41.6927|± |0.7396|
71
- | | |none | 0|rouge1_acc | 0.5300|± |0.0175|
72
- | | |none | 0|rouge1_diff| 1.4267|± |0.6161|
73
- | | |none | 0|rouge2_max |27.3013|± |0.7882|
74
- | | |none | 0|rouge2_acc | 0.4272|± |0.0173|
75
- | | |none | 0|rouge2_diff| 1.5314|± |0.6903|
76
- | | |none | 0|rougeL_max |37.8174|± |0.7378|
77
- | | |none | 0|rougeL_acc | 0.4859|± |0.0175|
78
- | | |none | 0|rougeL_diff| 1.2621|± |0.6243|
79
- | - truthfulqa_mc1| 2|none | 0|acc | 0.5753|± |0.0173|
80
- | - truthfulqa_mc2| 2|none | 0|acc | 0.7043|± |0.0150|
81
-
82
- | Groups |Version|Filter|n-shot| Metric | Value | |Stderr|
83
- |----------|-------|------|-----:|-----------|------:|---|-----:|
84
- |truthfulqa|N/A |none | 0|bleu_max |16.3339|± |0.3451|
85
- | | |none | 0|bleu_acc | 0.4982|± |0.0003|
86
- | | |none | 0|bleu_diff | 1.2909|± |0.1919|
87
- | | |none | 0|rouge1_max |41.6927|± |0.5469|
88
- | | |none | 0|rouge1_acc | 0.5300|± |0.0003|
89
- | | |none | 0|rouge1_diff| 1.4267|± |0.3796|
90
- | | |none | 0|rouge2_max |27.3013|± |0.6213|
91
- | | |none | 0|rouge2_acc | 0.4272|± |0.0003|
92
- | | |none | 0|rouge2_diff| 1.5314|± |0.4765|
93
- | | |none | 0|rougeL_max |37.8174|± |0.5443|
94
- | | |none | 0|rougeL_acc | 0.4859|± |0.0003|
95
- | | |none | 0|rougeL_diff| 1.2621|± |0.3898|
96
- | | |none | 0|acc | 0.6613|± |0.0435|
97
 
98
 
99
  Average 75.94
 
17
 
18
  - **Finetuned from model :** alnrg2arg/blockchainlabs_7B_merged_test2_4
19
 
20
+ This is a SFT version of the model from blockchainlab test 2.4 - alnrg2arg/blockchainlabs_7B_merged_test2_4.
21
+
22
+ The project is running to make a small LLM for a on-device purpose.
23
+
24
+ Overall pipeline for this iteration is
25
+
26
+ 1.Merging to make a base model (7B)
27
+ 2.Prune the model to reduce the parameter (50% sparcity)
28
+ 3.For recovery phase of the pruning, the DPO is chosen.
29
+
30
+ This model which is not pruned is intended to compare with the pruned model.
31
+
32
+ DPO consists of two parts : SFT and DPO - Now this model is the intermediate format (SFT)
33
+ This model can also be compared to the DPO version of the model.
34
+
35
+
36
+ This is the code and parameters I chose for this model(SFT).
37
+
38
+ ```
39
+ from transformers import TrainingArguments
40
+ from trl import SFTTrainer
41
+ from datasets import load_dataset
42
+ from unsloth import FastLanguageModel, FastMistralModel
43
+
44
+
45
+ max_seq_length = 2048 # Supports automatic RoPE Scaling, so choose any number
46
+
47
+ # Load model
48
+ model, tokenizer = FastMistralModel.from_pretrained(
49
+ model_name = "alnrg2arg/blockchainlabs_7B_merged_test2_4,
50
+ max_seq_length = max_seq_length,
51
+ dtype = None, # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
52
+ load_in_4bit = True, # Use 4bit quantization to reduce memory usage. Can be False
53
+ #device_map = "balanced"
54
+ # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
55
+ )
56
+
57
+ model = FastMistralModel.get_peft_model(
58
+ model,
59
+ r = 16,
60
+ target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
61
+ "gate_proj", "up_proj", "down_proj",],
62
+ lora_alpha = 16,
63
+ lora_dropout = 0, # Dropout = 0 is currently optimized
64
+ bias = "none", # Bias = "none" is currently optimized
65
+ use_gradient_checkpointing = True,
66
+ random_state = 3407,
67
+ max_seq_length = max_seq_length,
68
+ )
69
+ ```
70
+
71
+ The code and parameters are borrowed from https://colab.research.google.com/drive/1SKrKGV-BZoU4kv5q3g0jtE_OhRgPtrrQ?usp=sharing
72
+
73
+
74
  Benchmark scores
75
 
76
  | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
 
103
  |-----|------:|----------|-----:|-----------|-----:|---|-----:|
104
  |gsm8k| 2|get-answer| 5|exact_match|0.7468|± | 0.012|
105
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
 
107
 
108
  Average 75.94