chansung commited on
Commit
204530b
1 Parent(s): 5215b73

Model save

Browse files
README.md CHANGED
@@ -1,57 +1,73 @@
1
  ---
 
 
2
  base_model: google/gemma-7b
3
- library_name: transformers
4
- model_name: gemma7b-lora-alpaca-11-v1
5
  tags:
6
- - generated_from_trainer
7
  - trl
8
  - sft
9
- licence: license
 
 
 
 
 
10
  ---
11
 
12
- # Model Card for gemma7b-lora-alpaca-11-v1
 
13
 
14
- This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b).
15
- It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
- ## Quick start
 
 
18
 
19
- ```python
20
- from transformers import pipeline
21
 
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="klcsp/gemma7b-lora-alpaca-11-v1", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
26
- ```
27
 
28
- ## Training procedure
29
 
 
30
 
 
31
 
32
- This model was trained with SFT.
33
 
34
- ### Framework versions
 
 
35
 
36
- - TRL: 0.12.1
37
- - Transformers: 4.46.2
38
- - Pytorch: 2.3.1+cu121
39
- - Datasets: 3.1.0
40
- - Tokenizers: 0.20.3
 
 
 
 
 
 
 
 
 
41
 
42
- ## Citations
43
 
 
 
 
 
 
 
 
44
 
45
 
46
- Cite TRL as:
47
-
48
- ```bibtex
49
- @misc{vonwerra2022trl,
50
- title = {{TRL: Transformer Reinforcement Learning}},
51
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
52
- year = 2020,
53
- journal = {GitHub repository},
54
- publisher = {GitHub},
55
- howpublished = {\url{https://github.com/huggingface/trl}}
56
- }
57
- ```
 
1
  ---
2
+ library_name: peft
3
+ license: gemma
4
  base_model: google/gemma-7b
 
 
5
  tags:
 
6
  - trl
7
  - sft
8
+ - generated_from_trainer
9
+ datasets:
10
+ - generator
11
+ model-index:
12
+ - name: gemma7b-lora-alpaca-11-v1
13
+ results: []
14
  ---
15
 
16
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
+ should probably proofread and complete it, then remove this comment. -->
18
 
19
+ # gemma7b-lora-alpaca-11-v1
 
20
 
21
+ This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) on the generator dataset.
22
+ It achieves the following results on the evaluation set:
23
+ - Loss: 1.6643
24
 
25
+ ## Model description
 
26
 
27
+ More information needed
 
 
 
 
28
 
29
+ ## Intended uses & limitations
30
 
31
+ More information needed
32
 
33
+ ## Training and evaluation data
34
 
35
+ More information needed
36
 
37
+ ## Training procedure
38
+
39
+ ### Training hyperparameters
40
 
41
+ The following hyperparameters were used during training:
42
+ - learning_rate: 0.0002
43
+ - train_batch_size: 8
44
+ - eval_batch_size: 8
45
+ - seed: 42
46
+ - distributed_type: multi-GPU
47
+ - num_devices: 8
48
+ - gradient_accumulation_steps: 2
49
+ - total_train_batch_size: 128
50
+ - total_eval_batch_size: 64
51
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
52
+ - lr_scheduler_type: cosine
53
+ - lr_scheduler_warmup_ratio: 0.1
54
+ - num_epochs: 5
55
 
56
+ ### Training results
57
 
58
+ | Training Loss | Epoch | Step | Validation Loss |
59
+ |:-------------:|:------:|:----:|:---------------:|
60
+ | 2.9056 | 0.9924 | 65 | 2.6113 |
61
+ | 1.8271 | 2.0 | 131 | 1.8230 |
62
+ | 1.7019 | 2.9924 | 196 | 1.7041 |
63
+ | 1.7024 | 4.0 | 262 | 1.6962 |
64
+ | 1.6463 | 4.9618 | 325 | 1.6643 |
65
 
66
 
67
+ ### Framework versions
68
+
69
+ - PEFT 0.13.2
70
+ - Transformers 4.46.2
71
+ - Pytorch 2.3.1+cu121
72
+ - Datasets 3.1.0
73
+ - Tokenizers 0.20.3
 
 
 
 
 
all_results.json CHANGED
@@ -1,14 +1,9 @@
1
  {
2
- "epoch": 1.0,
3
- "eval_loss": 2.035097599029541,
4
- "eval_runtime": 20.9523,
5
- "eval_samples": 5201,
6
- "eval_samples_per_second": 43.957,
7
- "eval_steps_per_second": 0.955,
8
- "total_flos": 1.997211509414953e+17,
9
- "train_loss": 9.505996913400315,
10
- "train_runtime": 975.3365,
11
  "train_samples": 46801,
12
- "train_samples_per_second": 8.584,
13
- "train_steps_per_second": 0.134
14
  }
 
1
  {
2
+ "epoch": 4.961832061068702,
3
+ "total_flos": 9.909828121379471e+17,
4
+ "train_loss": 5.476599056537335,
5
+ "train_runtime": 4095.1846,
 
 
 
 
 
6
  "train_samples": 46801,
7
+ "train_samples_per_second": 10.222,
8
+ "train_steps_per_second": 0.079
9
  }
runs/Nov15_11-44-05_main-lora-gemma7b-alpaca-0-0/events.out.tfevents.1731689602.main-lora-gemma7b-alpaca-0-0.456.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8844cbc996531590a77871d77967251e6123852a5430c5f0dd09d09de470f966
3
- size 21563
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:260fbb5a0d7c1f1237912d519bec44b4033367356963a4f7b20a8fa648872c33
3
+ size 22188
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 1.0,
3
- "total_flos": 1.997211509414953e+17,
4
- "train_loss": 9.505996913400315,
5
- "train_runtime": 975.3365,
6
  "train_samples": 46801,
7
- "train_samples_per_second": 8.584,
8
- "train_steps_per_second": 0.134
9
  }
 
1
  {
2
+ "epoch": 4.961832061068702,
3
+ "total_flos": 9.909828121379471e+17,
4
+ "train_loss": 5.476599056537335,
5
+ "train_runtime": 4095.1846,
6
  "train_samples": 46801,
7
+ "train_samples_per_second": 10.222,
8
+ "train_steps_per_second": 0.079
9
  }
trainer_state.json CHANGED
@@ -1,224 +1,529 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 1.0,
5
  "eval_steps": 500,
6
- "global_step": 131,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.007633587786259542,
13
- "grad_norm": 177.04722595214844,
14
- "learning_rate": 1.4285714285714285e-05,
15
- "loss": 47.6977,
16
  "step": 1
17
  },
18
  {
19
- "epoch": 0.03816793893129771,
20
- "grad_norm": 85.62853240966797,
21
- "learning_rate": 7.142857142857143e-05,
22
- "loss": 43.3882,
23
  "step": 5
24
  },
25
  {
26
- "epoch": 0.07633587786259542,
27
- "grad_norm": 22.000961303710938,
28
- "learning_rate": 0.00014285714285714287,
29
- "loss": 32.4726,
30
  "step": 10
31
  },
32
  {
33
- "epoch": 0.11450381679389313,
34
- "grad_norm": 12.63040828704834,
35
- "learning_rate": 0.00019996395276708856,
36
- "loss": 26.3526,
37
  "step": 15
38
  },
39
  {
40
- "epoch": 0.15267175572519084,
41
- "grad_norm": 6.1236572265625,
42
- "learning_rate": 0.00019870502626379127,
43
- "loss": 23.8025,
44
  "step": 20
45
  },
46
  {
47
- "epoch": 0.19083969465648856,
48
- "grad_norm": 9.36489200592041,
49
- "learning_rate": 0.00019566964208274254,
50
- "loss": 22.3186,
51
  "step": 25
52
  },
53
  {
54
- "epoch": 0.22900763358778625,
55
- "grad_norm": 17.107019424438477,
56
- "learning_rate": 0.0001909124299802724,
57
- "loss": 20.77,
58
  "step": 30
59
  },
60
  {
61
- "epoch": 0.26717557251908397,
62
- "grad_norm": 27.212358474731445,
63
- "learning_rate": 0.0001845190085543795,
64
- "loss": 17.8624,
65
  "step": 35
66
  },
67
  {
68
- "epoch": 0.3053435114503817,
69
- "grad_norm": 36.82498550415039,
70
- "learning_rate": 0.0001766044443118978,
71
- "loss": 12.7998,
72
  "step": 40
73
  },
74
  {
75
- "epoch": 0.3435114503816794,
76
- "grad_norm": 30.142446517944336,
77
- "learning_rate": 0.00016731118074275704,
78
- "loss": 7.393,
79
  "step": 45
80
  },
81
  {
82
- "epoch": 0.3816793893129771,
83
- "grad_norm": 15.062440872192383,
84
- "learning_rate": 0.00015680647467311557,
85
- "loss": 4.1547,
86
  "step": 50
87
  },
88
  {
89
- "epoch": 0.4198473282442748,
90
- "grad_norm": 9.832117080688477,
91
- "learning_rate": 0.00014527938603696376,
92
- "loss": 3.463,
93
  "step": 55
94
  },
95
  {
96
- "epoch": 0.4580152671755725,
97
- "grad_norm": 4.392879009246826,
98
- "learning_rate": 0.00013293737524320797,
99
- "loss": 2.8845,
100
  "step": 60
101
  },
102
  {
103
- "epoch": 0.4961832061068702,
104
- "grad_norm": 2.260551929473877,
105
- "learning_rate": 0.00012000256937760445,
106
- "loss": 2.5922,
107
  "step": 65
108
  },
109
  {
110
- "epoch": 0.5343511450381679,
111
- "grad_norm": 3.587684154510498,
112
- "learning_rate": 0.00010670776443910024,
113
- "loss": 2.3901,
 
 
 
 
 
 
 
 
114
  "step": 70
115
  },
116
  {
117
- "epoch": 0.5725190839694656,
118
- "grad_norm": 2.6524131298065186,
119
- "learning_rate": 9.329223556089975e-05,
120
- "loss": 2.3052,
121
  "step": 75
122
  },
123
  {
124
- "epoch": 0.6106870229007634,
125
- "grad_norm": 0.9529216885566711,
126
- "learning_rate": 7.999743062239557e-05,
127
- "loss": 2.2007,
128
  "step": 80
129
  },
130
  {
131
- "epoch": 0.648854961832061,
132
- "grad_norm": 1.3604825735092163,
133
- "learning_rate": 6.706262475679205e-05,
134
- "loss": 2.1535,
135
  "step": 85
136
  },
137
  {
138
- "epoch": 0.6870229007633588,
139
- "grad_norm": 1.0362510681152344,
140
- "learning_rate": 5.472061396303629e-05,
141
- "loss": 2.1222,
142
  "step": 90
143
  },
144
  {
145
- "epoch": 0.7251908396946565,
146
- "grad_norm": 1.1193585395812988,
147
- "learning_rate": 4.3193525326884435e-05,
148
- "loss": 2.0834,
149
  "step": 95
150
  },
151
  {
152
- "epoch": 0.7633587786259542,
153
- "grad_norm": 1.9800268411636353,
154
- "learning_rate": 3.268881925724297e-05,
155
- "loss": 2.0757,
156
  "step": 100
157
  },
158
  {
159
- "epoch": 0.8015267175572519,
160
- "grad_norm": 1.1855801343917847,
161
- "learning_rate": 2.339555568810221e-05,
162
- "loss": 2.0367,
163
  "step": 105
164
  },
165
  {
166
- "epoch": 0.8396946564885496,
167
- "grad_norm": 1.8571408987045288,
168
- "learning_rate": 1.5480991445620542e-05,
169
- "loss": 2.0068,
170
  "step": 110
171
  },
172
  {
173
- "epoch": 0.8778625954198473,
174
- "grad_norm": 0.9461548924446106,
175
- "learning_rate": 9.08757001972762e-06,
176
- "loss": 2.0445,
177
  "step": 115
178
  },
179
  {
180
- "epoch": 0.916030534351145,
181
- "grad_norm": 2.1497249603271484,
182
- "learning_rate": 4.3303579172574885e-06,
183
- "loss": 2.0147,
184
  "step": 120
185
  },
186
  {
187
- "epoch": 0.9541984732824428,
188
- "grad_norm": 1.3707141876220703,
189
- "learning_rate": 1.2949737362087156e-06,
190
- "loss": 2.0338,
191
  "step": 125
192
  },
193
  {
194
- "epoch": 0.9923664122137404,
195
- "grad_norm": 1.4225437641143799,
196
- "learning_rate": 3.60472329114625e-08,
197
- "loss": 2.0402,
198
  "step": 130
199
  },
200
  {
201
- "epoch": 1.0,
202
- "eval_loss": 2.035097599029541,
203
- "eval_runtime": 20.9668,
204
- "eval_samples_per_second": 43.927,
205
- "eval_steps_per_second": 0.954,
206
  "step": 131
207
  },
208
  {
209
- "epoch": 1.0,
210
- "step": 131,
211
- "total_flos": 1.997211509414953e+17,
212
- "train_loss": 9.505996913400315,
213
- "train_runtime": 975.3365,
214
- "train_samples_per_second": 8.584,
215
- "train_steps_per_second": 0.134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
216
  }
217
  ],
218
  "logging_steps": 5,
219
- "max_steps": 131,
220
  "num_input_tokens_seen": 0,
221
- "num_train_epochs": 1,
222
  "save_steps": 100,
223
  "stateful_callbacks": {
224
  "TrainerControl": {
@@ -232,8 +537,8 @@
232
  "attributes": {}
233
  }
234
  },
235
- "total_flos": 1.997211509414953e+17,
236
- "train_batch_size": 4,
237
  "trial_name": null,
238
  "trial_params": null
239
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 4.961832061068702,
5
  "eval_steps": 500,
6
+ "global_step": 325,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.015267175572519083,
13
+ "grad_norm": 183.11753845214844,
14
+ "learning_rate": 6.060606060606061e-06,
15
+ "loss": 46.1063,
16
  "step": 1
17
  },
18
  {
19
+ "epoch": 0.07633587786259542,
20
+ "grad_norm": 136.03738403320312,
21
+ "learning_rate": 3.0303030303030306e-05,
22
+ "loss": 44.0302,
23
  "step": 5
24
  },
25
  {
26
+ "epoch": 0.15267175572519084,
27
+ "grad_norm": 69.2432632446289,
28
+ "learning_rate": 6.060606060606061e-05,
29
+ "loss": 38.4659,
30
  "step": 10
31
  },
32
  {
33
+ "epoch": 0.22900763358778625,
34
+ "grad_norm": 17.486797332763672,
35
+ "learning_rate": 9.090909090909092e-05,
36
+ "loss": 30.3029,
37
  "step": 15
38
  },
39
  {
40
+ "epoch": 0.3053435114503817,
41
+ "grad_norm": 13.530756950378418,
42
+ "learning_rate": 0.00012121212121212122,
43
+ "loss": 26.6709,
44
  "step": 20
45
  },
46
  {
47
+ "epoch": 0.3816793893129771,
48
+ "grad_norm": 7.521498680114746,
49
+ "learning_rate": 0.00015151515151515152,
50
+ "loss": 24.4319,
51
  "step": 25
52
  },
53
  {
54
+ "epoch": 0.4580152671755725,
55
+ "grad_norm": 5.912084102630615,
56
+ "learning_rate": 0.00018181818181818183,
57
+ "loss": 22.862,
58
  "step": 30
59
  },
60
  {
61
+ "epoch": 0.5343511450381679,
62
+ "grad_norm": 10.610209465026855,
63
+ "learning_rate": 0.00019997685019798912,
64
+ "loss": 21.5999,
65
  "step": 35
66
  },
67
  {
68
+ "epoch": 0.6106870229007634,
69
+ "grad_norm": 20.944725036621094,
70
+ "learning_rate": 0.0001997165380022878,
71
+ "loss": 19.4719,
72
  "step": 40
73
  },
74
  {
75
+ "epoch": 0.6870229007633588,
76
+ "grad_norm": 34.12383270263672,
77
+ "learning_rate": 0.000199167731989929,
78
+ "loss": 14.6832,
79
  "step": 45
80
  },
81
  {
82
+ "epoch": 0.7633587786259542,
83
+ "grad_norm": 42.86738204956055,
84
+ "learning_rate": 0.0001983320199330545,
85
+ "loss": 8.7569,
86
  "step": 50
87
  },
88
  {
89
+ "epoch": 0.8396946564885496,
90
+ "grad_norm": 12.474686622619629,
91
+ "learning_rate": 0.00019721181966290613,
92
+ "loss": 4.3457,
93
  "step": 55
94
  },
95
  {
96
+ "epoch": 0.916030534351145,
97
+ "grad_norm": 9.623456954956055,
98
+ "learning_rate": 0.00019581037207470382,
99
+ "loss": 3.4309,
100
  "step": 60
101
  },
102
  {
103
+ "epoch": 0.9923664122137404,
104
+ "grad_norm": 3.5216312408447266,
105
+ "learning_rate": 0.00019413173175128473,
106
+ "loss": 2.9056,
107
  "step": 65
108
  },
109
  {
110
+ "epoch": 0.9923664122137404,
111
+ "eval_loss": 2.611328125,
112
+ "eval_runtime": 19.2134,
113
+ "eval_samples_per_second": 47.935,
114
+ "eval_steps_per_second": 0.781,
115
+ "step": 65
116
+ },
117
+ {
118
+ "epoch": 1.0687022900763359,
119
+ "grad_norm": 2.9582359790802,
120
+ "learning_rate": 0.00019218075523263104,
121
+ "loss": 2.7809,
122
  "step": 70
123
  },
124
  {
125
+ "epoch": 1.1450381679389312,
126
+ "grad_norm": 2.319239616394043,
127
+ "learning_rate": 0.00018996308696522433,
128
+ "loss": 2.3224,
129
  "step": 75
130
  },
131
  {
132
+ "epoch": 1.2213740458015268,
133
+ "grad_norm": 1.3839267492294312,
134
+ "learning_rate": 0.00018748514297187648,
135
+ "loss": 2.2039,
136
  "step": 80
137
  },
138
  {
139
+ "epoch": 1.297709923664122,
140
+ "grad_norm": 0.5840837955474854,
141
+ "learning_rate": 0.00018475409228928312,
142
+ "loss": 2.1174,
143
  "step": 85
144
  },
145
  {
146
+ "epoch": 1.3740458015267176,
147
+ "grad_norm": 1.5493711233139038,
148
+ "learning_rate": 0.00018177783622700327,
149
+ "loss": 2.0565,
150
  "step": 90
151
  },
152
  {
153
+ "epoch": 1.450381679389313,
154
+ "grad_norm": 0.7415986657142639,
155
+ "learning_rate": 0.00017856498550787144,
156
+ "loss": 2.003,
157
  "step": 95
158
  },
159
  {
160
+ "epoch": 1.5267175572519083,
161
+ "grad_norm": 0.6342356204986572,
162
+ "learning_rate": 0.00017512483535597867,
163
+ "loss": 1.9686,
164
  "step": 100
165
  },
166
  {
167
+ "epoch": 1.6030534351145038,
168
+ "grad_norm": 1.0893248319625854,
169
+ "learning_rate": 0.00017146733860429612,
170
+ "loss": 1.9499,
171
  "step": 105
172
  },
173
  {
174
+ "epoch": 1.6793893129770994,
175
+ "grad_norm": 1.233128547668457,
176
+ "learning_rate": 0.0001676030768997445,
177
+ "loss": 1.9192,
178
  "step": 110
179
  },
180
  {
181
+ "epoch": 1.7557251908396947,
182
+ "grad_norm": 0.7829602360725403,
183
+ "learning_rate": 0.00016354323008901776,
184
+ "loss": 1.8934,
185
  "step": 115
186
  },
187
  {
188
+ "epoch": 1.83206106870229,
189
+ "grad_norm": 1.0393383502960205,
190
+ "learning_rate": 0.00015929954387373103,
191
+ "loss": 1.8579,
192
  "step": 120
193
  },
194
  {
195
+ "epoch": 1.9083969465648853,
196
+ "grad_norm": 2.433302879333496,
197
+ "learning_rate": 0.00015488429582847192,
198
+ "loss": 1.8576,
199
  "step": 125
200
  },
201
  {
202
+ "epoch": 1.984732824427481,
203
+ "grad_norm": 1.2537367343902588,
204
+ "learning_rate": 0.00015031025988006936,
205
+ "loss": 1.8271,
206
  "step": 130
207
  },
208
  {
209
+ "epoch": 2.0,
210
+ "eval_loss": 1.8229883909225464,
211
+ "eval_runtime": 19.0953,
212
+ "eval_samples_per_second": 48.232,
213
+ "eval_steps_per_second": 0.786,
214
  "step": 131
215
  },
216
  {
217
+ "epoch": 2.0610687022900764,
218
+ "grad_norm": 1.04417085647583,
219
+ "learning_rate": 0.00014559066935084588,
220
+ "loss": 1.975,
221
+ "step": 135
222
+ },
223
+ {
224
+ "epoch": 2.1374045801526718,
225
+ "grad_norm": 0.9754623174667358,
226
+ "learning_rate": 0.00014073917867277557,
227
+ "loss": 1.7901,
228
+ "step": 140
229
+ },
230
+ {
231
+ "epoch": 2.213740458015267,
232
+ "grad_norm": 0.6031882762908936,
233
+ "learning_rate": 0.0001357698238833126,
234
+ "loss": 1.7584,
235
+ "step": 145
236
+ },
237
+ {
238
+ "epoch": 2.2900763358778624,
239
+ "grad_norm": 1.7654844522476196,
240
+ "learning_rate": 0.000130696982017182,
241
+ "loss": 1.7665,
242
+ "step": 150
243
+ },
244
+ {
245
+ "epoch": 2.366412213740458,
246
+ "grad_norm": 1.8184305429458618,
247
+ "learning_rate": 0.0001255353295116187,
248
+ "loss": 1.7496,
249
+ "step": 155
250
+ },
251
+ {
252
+ "epoch": 2.4427480916030535,
253
+ "grad_norm": 2.4291305541992188,
254
+ "learning_rate": 0.00012029979974539234,
255
+ "loss": 1.7389,
256
+ "step": 160
257
+ },
258
+ {
259
+ "epoch": 2.519083969465649,
260
+ "grad_norm": 0.7844381928443909,
261
+ "learning_rate": 0.00011500553983446527,
262
+ "loss": 1.7327,
263
+ "step": 165
264
+ },
265
+ {
266
+ "epoch": 2.595419847328244,
267
+ "grad_norm": 1.0221455097198486,
268
+ "learning_rate": 0.00010966786680927874,
269
+ "loss": 1.7365,
270
+ "step": 170
271
+ },
272
+ {
273
+ "epoch": 2.67175572519084,
274
+ "grad_norm": 1.1956524848937988,
275
+ "learning_rate": 0.00010430222330045304,
276
+ "loss": 1.7204,
277
+ "step": 175
278
+ },
279
+ {
280
+ "epoch": 2.7480916030534353,
281
+ "grad_norm": 0.7325518131256104,
282
+ "learning_rate": 9.892413286110886e-05,
283
+ "loss": 1.7177,
284
+ "step": 180
285
+ },
286
+ {
287
+ "epoch": 2.8244274809160306,
288
+ "grad_norm": 0.8538561463356018,
289
+ "learning_rate": 9.354915505506839e-05,
290
+ "loss": 1.7193,
291
+ "step": 185
292
+ },
293
+ {
294
+ "epoch": 2.900763358778626,
295
+ "grad_norm": 1.252325415611267,
296
+ "learning_rate": 8.81928404408726e-05,
297
+ "loss": 1.7058,
298
+ "step": 190
299
+ },
300
+ {
301
+ "epoch": 2.9770992366412212,
302
+ "grad_norm": 0.7734937071800232,
303
+ "learning_rate": 8.287068558185225e-05,
304
+ "loss": 1.7019,
305
+ "step": 195
306
+ },
307
+ {
308
+ "epoch": 2.9923664122137406,
309
+ "eval_loss": 1.7041354179382324,
310
+ "eval_runtime": 19.3108,
311
+ "eval_samples_per_second": 47.694,
312
+ "eval_steps_per_second": 0.777,
313
+ "step": 196
314
+ },
315
+ {
316
+ "epoch": 3.053435114503817,
317
+ "grad_norm": 0.6631619334220886,
318
+ "learning_rate": 7.759808821241406e-05,
319
+ "loss": 1.8697,
320
+ "step": 200
321
+ },
322
+ {
323
+ "epoch": 3.1297709923664123,
324
+ "grad_norm": 0.7187236547470093,
325
+ "learning_rate": 7.239030269025311e-05,
326
+ "loss": 1.7181,
327
+ "step": 205
328
+ },
329
+ {
330
+ "epoch": 3.2061068702290076,
331
+ "grad_norm": 0.5320985913276672,
332
+ "learning_rate": 6.726239586337408e-05,
333
+ "loss": 1.7351,
334
+ "step": 210
335
+ },
336
+ {
337
+ "epoch": 3.282442748091603,
338
+ "grad_norm": 0.43638336658477783,
339
+ "learning_rate": 6.22292034796035e-05,
340
+ "loss": 1.7156,
341
+ "step": 215
342
+ },
343
+ {
344
+ "epoch": 3.3587786259541983,
345
+ "grad_norm": 0.3966742753982544,
346
+ "learning_rate": 5.730528726470792e-05,
347
+ "loss": 1.7158,
348
+ "step": 220
349
+ },
350
+ {
351
+ "epoch": 3.435114503816794,
352
+ "grad_norm": 0.326159805059433,
353
+ "learning_rate": 5.2504892793295e-05,
354
+ "loss": 1.7055,
355
+ "step": 225
356
+ },
357
+ {
358
+ "epoch": 3.5114503816793894,
359
+ "grad_norm": 0.4766685664653778,
360
+ "learning_rate": 4.7841908274384616e-05,
361
+ "loss": 1.7006,
362
+ "step": 230
363
+ },
364
+ {
365
+ "epoch": 3.5877862595419847,
366
+ "grad_norm": 0.41363418102264404,
367
+ "learning_rate": 4.332982437088825e-05,
368
+ "loss": 1.7106,
369
+ "step": 235
370
+ },
371
+ {
372
+ "epoch": 3.66412213740458,
373
+ "grad_norm": 0.5006980299949646,
374
+ "learning_rate": 3.898169516924398e-05,
375
+ "loss": 1.6938,
376
+ "step": 240
377
+ },
378
+ {
379
+ "epoch": 3.7404580152671754,
380
+ "grad_norm": 0.4720315933227539,
381
+ "learning_rate": 3.4810100412128747e-05,
382
+ "loss": 1.6886,
383
+ "step": 245
384
+ },
385
+ {
386
+ "epoch": 3.816793893129771,
387
+ "grad_norm": 0.5057269334793091,
388
+ "learning_rate": 3.0827109103512643e-05,
389
+ "loss": 1.6912,
390
+ "step": 250
391
+ },
392
+ {
393
+ "epoch": 3.8931297709923665,
394
+ "grad_norm": 0.38378995656967163,
395
+ "learning_rate": 2.7044244591351232e-05,
396
+ "loss": 1.7001,
397
+ "step": 255
398
+ },
399
+ {
400
+ "epoch": 3.969465648854962,
401
+ "grad_norm": 0.3008043169975281,
402
+ "learning_rate": 2.3472451228937253e-05,
403
+ "loss": 1.7024,
404
+ "step": 260
405
+ },
406
+ {
407
+ "epoch": 4.0,
408
+ "eval_loss": 1.6962379217147827,
409
+ "eval_runtime": 18.9852,
410
+ "eval_samples_per_second": 48.512,
411
+ "eval_steps_per_second": 0.79,
412
+ "step": 262
413
+ },
414
+ {
415
+ "epoch": 4.0458015267175576,
416
+ "grad_norm": 0.9348434805870056,
417
+ "learning_rate": 2.0122062711363532e-05,
418
+ "loss": 1.8574,
419
+ "step": 265
420
+ },
421
+ {
422
+ "epoch": 4.122137404580153,
423
+ "grad_norm": 0.7455368638038635,
424
+ "learning_rate": 1.7002772178705716e-05,
425
+ "loss": 1.6594,
426
+ "step": 270
427
+ },
428
+ {
429
+ "epoch": 4.198473282442748,
430
+ "grad_norm": 0.5774383544921875,
431
+ "learning_rate": 1.4123604172419713e-05,
432
+ "loss": 1.6527,
433
+ "step": 275
434
+ },
435
+ {
436
+ "epoch": 4.2748091603053435,
437
+ "grad_norm": 0.5370898842811584,
438
+ "learning_rate": 1.149288852608743e-05,
439
+ "loss": 1.6587,
440
+ "step": 280
441
+ },
442
+ {
443
+ "epoch": 4.351145038167939,
444
+ "grad_norm": 0.7321135997772217,
445
+ "learning_rate": 9.118236266049707e-06,
446
+ "loss": 1.6676,
447
+ "step": 285
448
+ },
449
+ {
450
+ "epoch": 4.427480916030534,
451
+ "grad_norm": 0.5155964493751526,
452
+ "learning_rate": 7.0065175916482095e-06,
453
+ "loss": 1.6579,
454
+ "step": 290
455
+ },
456
+ {
457
+ "epoch": 4.5038167938931295,
458
+ "grad_norm": 0.6737932562828064,
459
+ "learning_rate": 5.163841998782837e-06,
460
+ "loss": 1.6508,
461
+ "step": 295
462
+ },
463
+ {
464
+ "epoch": 4.580152671755725,
465
+ "grad_norm": 0.9017395377159119,
466
+ "learning_rate": 3.595540604290437e-06,
467
+ "loss": 1.6375,
468
+ "step": 300
469
+ },
470
+ {
471
+ "epoch": 4.65648854961832,
472
+ "grad_norm": 0.5460083484649658,
473
+ "learning_rate": 2.30615072228183e-06,
474
+ "loss": 1.6522,
475
+ "step": 305
476
+ },
477
+ {
478
+ "epoch": 4.732824427480916,
479
+ "grad_norm": 0.5443113446235657,
480
+ "learning_rate": 1.2994027370611173e-06,
481
+ "loss": 1.648,
482
+ "step": 310
483
+ },
484
+ {
485
+ "epoch": 4.809160305343512,
486
+ "grad_norm": 0.6177972555160522,
487
+ "learning_rate": 5.782093106048159e-07,
488
+ "loss": 1.6559,
489
+ "step": 315
490
+ },
491
+ {
492
+ "epoch": 4.885496183206107,
493
+ "grad_norm": 0.4734289050102234,
494
+ "learning_rate": 1.446569558255395e-07,
495
+ "loss": 1.6443,
496
+ "step": 320
497
+ },
498
+ {
499
+ "epoch": 4.961832061068702,
500
+ "grad_norm": 0.6619871854782104,
501
+ "learning_rate": 0.0,
502
+ "loss": 1.6463,
503
+ "step": 325
504
+ },
505
+ {
506
+ "epoch": 4.961832061068702,
507
+ "eval_loss": 1.664337158203125,
508
+ "eval_runtime": 18.9808,
509
+ "eval_samples_per_second": 48.523,
510
+ "eval_steps_per_second": 0.79,
511
+ "step": 325
512
+ },
513
+ {
514
+ "epoch": 4.961832061068702,
515
+ "step": 325,
516
+ "total_flos": 9.909828121379471e+17,
517
+ "train_loss": 5.476599056537335,
518
+ "train_runtime": 4095.1846,
519
+ "train_samples_per_second": 10.222,
520
+ "train_steps_per_second": 0.079
521
  }
522
  ],
523
  "logging_steps": 5,
524
+ "max_steps": 325,
525
  "num_input_tokens_seen": 0,
526
+ "num_train_epochs": 5,
527
  "save_steps": 100,
528
  "stateful_callbacks": {
529
  "TrainerControl": {
 
537
  "attributes": {}
538
  }
539
  },
540
+ "total_flos": 9.909828121379471e+17,
541
+ "train_batch_size": 8,
542
  "trial_name": null,
543
  "trial_params": null
544
  }