IggoOnCode commited on
Commit
b44e736
1 Parent(s): 221e30a

First version of the mamba-2.8b-slimpj-OpenOrca_1ep model and tokenizer (copy of EleutherAI/gpt-neox-20b).

Browse files
README.md CHANGED
@@ -1,3 +1,136 @@
1
  ---
2
- license: apache-2.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {{ card_data }}
5
  ---
6
+
7
+ # Model Card for mamba-2.8b-slimpj-OpenOrca_1ep
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset.
12
+
13
+ ## Model Details
14
+
15
+ ### Model Description
16
+
17
+ <!-- Provide a longer summary of what this model is. -->
18
+ This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752
19
+
20
+ It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch.
21
+
22
+ - **Model type:** Mamba State Space Model (mamba_ssm)
23
+ - **Finetuned from model:** https://huggingface.co/state-spaces/mamba-2.8b-slimpj
24
+
25
+
26
+ ## Uses
27
+
28
+ This model is intended to evaluate fine-tuning results on mamba models.
29
+
30
+ ## Training Details
31
+
32
+ ### Training Data
33
+
34
+ https://huggingface.co/datasets/Open-Orca/OpenOrca
35
+
36
+ ### Training Procedure
37
+
38
+ Trained using text-generation-webui with code from the mamba_ssm pull request.
39
+
40
+
41
+ #### Training Hyperparameters
42
+
43
+ - **Training regime:** Trained in bfloat16 with the following parameters:
44
+
45
+ ```
46
+ {
47
+ "trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
48
+ "save_steps": 500000.0,
49
+ "micro_batch_size": 4,
50
+ "batch_size": 128,
51
+ "epochs": 1.0,
52
+ "learning_rate": "3e-4",
53
+ "lr_scheduler_type": "linear",
54
+ "cutoff_len": 256,
55
+ "dataset": "OpenOrca",
56
+ "eval_dataset": "None",
57
+ "format": "openorca-format",
58
+ "warmup_steps": 100.0,
59
+ "optimizer": "paged_adamw_8bit",
60
+ "hard_cut_string": "\\n\\n\\n",
61
+ "add_eos_token": false,
62
+ "min_chars": 0.0,
63
+ }
64
+
65
+ ```
66
+ Reported train_loss was 0.6762700151924311
67
+
68
+ ### Results
69
+
70
+ #### lm-evaluation-harness results for final model
71
+
72
+ mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
73
+ | Tasks |Version|Filter|n-shot| Metric | Value | |Stderr|
74
+ |--------------|------:|------|-----:|----------|------:|---|-----:|
75
+ |arc_challenge | 1|none | 0|acc | 0.2594|± |0.0128|
76
+ | | |none | 0|acc_norm | 0.2935|± |0.0133|
77
+ |arc_easy | 1|none | 0|acc | 0.4390|± |0.0102|
78
+ | | |none | 0|acc_norm | 0.4032|± |0.0101|
79
+ |boolq | 2|none | 0|acc | 0.5801|± |0.0086|
80
+ |lambada_openai| 1|none | 0|perplexity|27.8582|± |1.1183|
81
+ | | |none | 0|acc | 0.3683|± |0.0067|
82
+ |openbookqa | 1|none | 0|acc | 0.2500|± |0.0194|
83
+ | | |none | 0|acc_norm | 0.3700|± |0.0216|
84
+ |piqa | 1|none | 0|acc | 0.6817|± |0.0109|
85
+ | | |none | 0|acc_norm | 0.6839|± |0.0108|
86
+ |winogrande | 1|none | 0|acc | 0.5770|± |0.0139|
87
+
88
+ #### lm-evaluation-harness results after half epoch
89
+
90
+ mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
91
+ | Tasks |Version|Filter|n-shot| Metric | Value | |Stderr|
92
+ |--------------|------:|------|-----:|----------|------:|---|-----:|
93
+ |arc_challenge | 1|none | 0|acc | 0.2602|± |0.0128|
94
+ | | |none | 0|acc_norm | 0.2833|± |0.0132|
95
+ |arc_easy | 1|none | 0|acc | 0.4533|± |0.0102|
96
+ | | |none | 0|acc_norm | 0.4125|± |0.0101|
97
+ |boolq | 2|none | 0|acc | 0.4095|± |0.0086|
98
+ |lambada_openai| 1|none | 0|perplexity|30.4832|± |1.2403|
99
+ | | |none | 0|acc | 0.3551|± |0.0067|
100
+ |openbookqa | 1|none | 0|acc | 0.2420|± |0.0192|
101
+ | | |none | 0|acc_norm | 0.3640|± |0.0215|
102
+ |piqa | 1|none | 0|acc | 0.6812|± |0.0109|
103
+ | | |none | 0|acc_norm | 0.6730|± |0.0109|
104
+ |winogrande | 1|none | 0|acc | 0.5588|± |0.0140|
105
+
106
+ #### Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning
107
+
108
+ mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)
109
+ | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
110
+ |--------------|------:|------|-----:|----------|-----:|---|-----:|
111
+ |arc_challenge | 1|none | 0|acc |0.3882|± |0.0142|
112
+ | | |none | 0|acc_norm |0.4155|± |0.0144|
113
+ |arc_easy | 1|none | 0|acc |0.7264|± |0.0091|
114
+ | | |none | 0|acc_norm |0.6814|± |0.0096|
115
+ |boolq | 2|none | 0|acc |0.7107|± |0.0079|
116
+ |lambada_openai| 1|none | 0|perplexity|5.8770|± |0.1881|
117
+ | | |none | 0|acc |0.6427|± |0.0067|
118
+ |openbookqa | 1|none | 0|acc |0.2860|± |0.0202|
119
+ | | |none | 0|acc_norm |0.3980|± |0.0219|
120
+ |piqa | 1|none | 0|acc |0.7709|± |0.0098|
121
+ | | |none | 0|acc_norm |0.7813|± |0.0096|
122
+ |winogrande | 1|none | 0|acc |0.6614|± |0.0133|
123
+
124
+
125
+
126
+ #### Summary
127
+
128
+ The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text.
129
+
130
+ The answer quality as percieved by users is yet to be evaluated.
131
+
132
+ ## Environmental Impact
133
+
134
+ - **Hardware Type:** RTX 3090
135
+ - **Hours used:** 118
136
+
config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"d_model": 2560, "n_layer": 64, "vocab_size": 50277, "ssm_cfg": {}, "rms_norm": true, "residual_in_fp32": true, "fused_add_norm": true, "pad_vocab_size_multiple": 8}
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:630951f04627b75b525ca5fc90d189154f8d971d504cedd140c52de096cbc6c8
3
+ size 5548078554
special_tokens_map.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": "<|endoftext|>",
10
+ "pad_token": "<|endoftext|>",
11
+ "unk_token": {
12
+ "content": "<|endoftext|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ }
18
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|padding|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "50254": {
21
+ "content": " ",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": false
27
+ },
28
+ "50255": {
29
+ "content": " ",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": false
35
+ },
36
+ "50256": {
37
+ "content": " ",
38
+ "lstrip": false,
39
+ "normalized": true,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": false
43
+ },
44
+ "50257": {
45
+ "content": " ",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "50258": {
53
+ "content": " ",
54
+ "lstrip": false,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ },
60
+ "50259": {
61
+ "content": " ",
62
+ "lstrip": false,
63
+ "normalized": true,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": false
67
+ },
68
+ "50260": {
69
+ "content": " ",
70
+ "lstrip": false,
71
+ "normalized": true,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": false
75
+ },
76
+ "50261": {
77
+ "content": " ",
78
+ "lstrip": false,
79
+ "normalized": true,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": false
83
+ },
84
+ "50262": {
85
+ "content": " ",
86
+ "lstrip": false,
87
+ "normalized": true,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": false
91
+ },
92
+ "50263": {
93
+ "content": " ",
94
+ "lstrip": false,
95
+ "normalized": true,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": false
99
+ },
100
+ "50264": {
101
+ "content": " ",
102
+ "lstrip": false,
103
+ "normalized": true,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": false
107
+ },
108
+ "50265": {
109
+ "content": " ",
110
+ "lstrip": false,
111
+ "normalized": true,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": false
115
+ },
116
+ "50266": {
117
+ "content": " ",
118
+ "lstrip": false,
119
+ "normalized": true,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "50267": {
125
+ "content": " ",
126
+ "lstrip": false,
127
+ "normalized": true,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "50268": {
133
+ "content": " ",
134
+ "lstrip": false,
135
+ "normalized": true,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "50269": {
141
+ "content": " ",
142
+ "lstrip": false,
143
+ "normalized": true,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "50270": {
149
+ "content": " ",
150
+ "lstrip": false,
151
+ "normalized": true,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "50271": {
157
+ "content": " ",
158
+ "lstrip": false,
159
+ "normalized": true,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "50272": {
165
+ "content": " ",
166
+ "lstrip": false,
167
+ "normalized": true,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": false
171
+ },
172
+ "50273": {
173
+ "content": " ",
174
+ "lstrip": false,
175
+ "normalized": true,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": false
179
+ },
180
+ "50274": {
181
+ "content": " ",
182
+ "lstrip": false,
183
+ "normalized": true,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": false
187
+ },
188
+ "50275": {
189
+ "content": " ",
190
+ "lstrip": false,
191
+ "normalized": true,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": false
195
+ },
196
+ "50276": {
197
+ "content": " ",
198
+ "lstrip": false,
199
+ "normalized": true,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": false
203
+ }
204
+ },
205
+ "bos_token": "<|endoftext|>",
206
+ "clean_up_tokenization_spaces": true,
207
+ "eos_token": "<|endoftext|>",
208
+ "model_max_length": 1000000000000000019884624838656,
209
+ "pad_token": "<|endoftext|>",
210
+ "tokenizer_class": "GPTNeoXTokenizer",
211
+ "unk_token": "<|endoftext|>"
212
+ }
training_log.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name": "UNTRAINED/mamba-2.8b-slimpj",
3
+ "base_model_class": "MambaSsmModel",
4
+ "loss": 0.4871,
5
+ "learning_rate": 1.814168657212832e-08,
6
+ "epoch": 1.0,
7
+ "current_steps": 1058463,
8
+ "train_runtime": 423405.7021,
9
+ "train_samples_per_second": 10.0,
10
+ "train_steps_per_second": 0.078,
11
+ "total_flos": 0.0,
12
+ "train_loss": 0.6762700151924311
13
+ }
training_parameters.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
3
+ "save_steps": 500000.0,
4
+ "micro_batch_size": 4,
5
+ "batch_size": 128,
6
+ "epochs": 1.0,
7
+ "learning_rate": "3e-4",
8
+ "lr_scheduler_type": "linear",
9
+ "cutoff_len": 256,
10
+ "dataset": "OpenOrca",
11
+ "eval_dataset": "None",
12
+ "format": "openorca-format",
13
+ "warmup_steps": 100.0,
14
+ "optimizer": "paged_adamw_8bit",
15
+ "hard_cut_string": "\\n\\n\\n",
16
+ "add_eos_token": false,
17
+ "min_chars": 0.0,
18
+ }
training_prompt.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "template_type": "dataset",
3
+ "template_1": "### Human:\n%question%\n\n### AI response:\n%response%",
4
+ "template_2": "### System instructions:\n%system_prompt%\n\n### Human:\n%question%\n\n### AI response:\n%response%",
5
+ "template_3": "### Human:\n%question%\n\n### AI response:\n",
6
+ "template_4": "### System instructions:\n%system_prompt%\n\n### Human:\n%question%\n\n### AI response:\n",
7
+ "template_5": "### AI response:\n%response%",
8
+ "template_6": "### System instructions:\n%system_prompt%\n\n### AI response:\n%response%"
9
+ }