vishalkatheriya18 commited on
Commit
96712f3
1 Parent(s): 8ff07a9

End of training

Browse files
README.md ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: facebook/convnextv2-tiny-1k-224
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - imagefolder
8
+ metrics:
9
+ - accuracy
10
+ model-index:
11
+ - name: convnextv2-tiny-1k-224-finetuned-fullwear
12
+ results:
13
+ - task:
14
+ name: Image Classification
15
+ type: image-classification
16
+ dataset:
17
+ name: imagefolder
18
+ type: imagefolder
19
+ config: default
20
+ split: train
21
+ args: default
22
+ metrics:
23
+ - name: Accuracy
24
+ type: accuracy
25
+ value: 0.8402777777777778
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ # convnextv2-tiny-1k-224-finetuned-fullwear
32
+
33
+ This model is a fine-tuned version of [facebook/convnextv2-tiny-1k-224](https://huggingface.co/facebook/convnextv2-tiny-1k-224) on the imagefolder dataset.
34
+ It achieves the following results on the evaluation set:
35
+ - Loss: 0.5203
36
+ - Accuracy: 0.8403
37
+
38
+ ## Model description
39
+
40
+ More information needed
41
+
42
+ ## Intended uses & limitations
43
+
44
+ More information needed
45
+
46
+ ## Training and evaluation data
47
+
48
+ More information needed
49
+
50
+ ## Training procedure
51
+
52
+ ### Training hyperparameters
53
+
54
+ The following hyperparameters were used during training:
55
+ - learning_rate: 5e-05
56
+ - train_batch_size: 32
57
+ - eval_batch_size: 32
58
+ - seed: 42
59
+ - gradient_accumulation_steps: 4
60
+ - total_train_batch_size: 128
61
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
+ - lr_scheduler_type: linear
63
+ - lr_scheduler_warmup_ratio: 0.1
64
+ - num_epochs: 120
65
+
66
+ ### Training results
67
+
68
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
69
+ |:-------------:|:--------:|:----:|:---------------:|:--------:|
70
+ | 2.4871 | 0.9756 | 10 | 2.4771 | 0.0694 |
71
+ | 2.4464 | 1.9512 | 20 | 2.4333 | 0.1528 |
72
+ | 2.3911 | 2.9268 | 30 | 2.3670 | 0.2778 |
73
+ | 2.3204 | 4.0 | 41 | 2.2617 | 0.3681 |
74
+ | 2.206 | 4.9756 | 51 | 2.1445 | 0.3958 |
75
+ | 2.0869 | 5.9512 | 61 | 2.0146 | 0.4444 |
76
+ | 1.9756 | 6.9268 | 71 | 1.8763 | 0.5139 |
77
+ | 1.8124 | 8.0 | 82 | 1.7422 | 0.5486 |
78
+ | 1.6624 | 8.9756 | 92 | 1.6629 | 0.5903 |
79
+ | 1.587 | 9.9512 | 102 | 1.5474 | 0.6111 |
80
+ | 1.4746 | 10.9268 | 112 | 1.4577 | 0.625 |
81
+ | 1.359 | 12.0 | 123 | 1.3055 | 0.6736 |
82
+ | 1.2412 | 12.9756 | 133 | 1.2241 | 0.6736 |
83
+ | 1.1374 | 13.9512 | 143 | 1.2003 | 0.6736 |
84
+ | 1.0194 | 14.9268 | 153 | 1.0233 | 0.7569 |
85
+ | 0.9705 | 16.0 | 164 | 0.9492 | 0.7847 |
86
+ | 0.8949 | 16.9756 | 174 | 0.9246 | 0.75 |
87
+ | 0.7959 | 17.9512 | 184 | 0.8148 | 0.7639 |
88
+ | 0.7491 | 18.9268 | 194 | 0.7858 | 0.7569 |
89
+ | 0.6783 | 20.0 | 205 | 0.8010 | 0.7569 |
90
+ | 0.6257 | 20.9756 | 215 | 0.7295 | 0.7847 |
91
+ | 0.5999 | 21.9512 | 225 | 0.6219 | 0.8333 |
92
+ | 0.5701 | 22.9268 | 235 | 0.5932 | 0.8403 |
93
+ | 0.4926 | 24.0 | 246 | 0.5970 | 0.8056 |
94
+ | 0.4692 | 24.9756 | 256 | 0.6298 | 0.8194 |
95
+ | 0.4393 | 25.9512 | 266 | 0.5857 | 0.8056 |
96
+ | 0.419 | 26.9268 | 276 | 0.5203 | 0.8542 |
97
+ | 0.3454 | 28.0 | 287 | 0.6084 | 0.8264 |
98
+ | 0.36 | 28.9756 | 297 | 0.5928 | 0.8264 |
99
+ | 0.3265 | 29.9512 | 307 | 0.5303 | 0.8403 |
100
+ | 0.3278 | 30.9268 | 317 | 0.6049 | 0.8194 |
101
+ | 0.2766 | 32.0 | 328 | 0.5656 | 0.8264 |
102
+ | 0.2805 | 32.9756 | 338 | 0.5003 | 0.8681 |
103
+ | 0.2505 | 33.9512 | 348 | 0.5412 | 0.8403 |
104
+ | 0.2464 | 34.9268 | 358 | 0.5410 | 0.8333 |
105
+ | 0.2166 | 36.0 | 369 | 0.5000 | 0.8472 |
106
+ | 0.2 | 36.9756 | 379 | 0.5053 | 0.8056 |
107
+ | 0.1914 | 37.9512 | 389 | 0.5161 | 0.8403 |
108
+ | 0.186 | 38.9268 | 399 | 0.4242 | 0.8681 |
109
+ | 0.1592 | 40.0 | 410 | 0.5059 | 0.8472 |
110
+ | 0.1598 | 40.9756 | 420 | 0.5143 | 0.8264 |
111
+ | 0.1565 | 41.9512 | 430 | 0.4703 | 0.8542 |
112
+ | 0.1598 | 42.9268 | 440 | 0.4384 | 0.8542 |
113
+ | 0.139 | 44.0 | 451 | 0.4850 | 0.8403 |
114
+ | 0.1137 | 44.9756 | 461 | 0.4405 | 0.8542 |
115
+ | 0.1158 | 45.9512 | 471 | 0.5250 | 0.8333 |
116
+ | 0.1192 | 46.9268 | 481 | 0.5843 | 0.8194 |
117
+ | 0.1271 | 48.0 | 492 | 0.4498 | 0.8611 |
118
+ | 0.0914 | 48.9756 | 502 | 0.5167 | 0.8264 |
119
+ | 0.1079 | 49.9512 | 512 | 0.4648 | 0.8681 |
120
+ | 0.091 | 50.9268 | 522 | 0.5321 | 0.8194 |
121
+ | 0.1053 | 52.0 | 533 | 0.4402 | 0.8611 |
122
+ | 0.0842 | 52.9756 | 543 | 0.4776 | 0.8542 |
123
+ | 0.0961 | 53.9512 | 553 | 0.4762 | 0.8681 |
124
+ | 0.0896 | 54.9268 | 563 | 0.4477 | 0.8681 |
125
+ | 0.0876 | 56.0 | 574 | 0.4951 | 0.8472 |
126
+ | 0.0855 | 56.9756 | 584 | 0.5653 | 0.8125 |
127
+ | 0.073 | 57.9512 | 594 | 0.5315 | 0.8472 |
128
+ | 0.0804 | 58.9268 | 604 | 0.5064 | 0.8681 |
129
+ | 0.0765 | 60.0 | 615 | 0.6316 | 0.8264 |
130
+ | 0.0782 | 60.9756 | 625 | 0.5733 | 0.8056 |
131
+ | 0.069 | 61.9512 | 635 | 0.6994 | 0.8056 |
132
+ | 0.0809 | 62.9268 | 645 | 0.4898 | 0.8611 |
133
+ | 0.0829 | 64.0 | 656 | 0.6042 | 0.8194 |
134
+ | 0.0735 | 64.9756 | 666 | 0.4758 | 0.8611 |
135
+ | 0.0763 | 65.9512 | 676 | 0.4921 | 0.8542 |
136
+ | 0.0565 | 66.9268 | 686 | 0.4700 | 0.8681 |
137
+ | 0.062 | 68.0 | 697 | 0.4944 | 0.8819 |
138
+ | 0.0644 | 68.9756 | 707 | 0.4733 | 0.8681 |
139
+ | 0.0659 | 69.9512 | 717 | 0.4703 | 0.8819 |
140
+ | 0.0625 | 70.9268 | 727 | 0.5075 | 0.8542 |
141
+ | 0.042 | 72.0 | 738 | 0.5464 | 0.8264 |
142
+ | 0.056 | 72.9756 | 748 | 0.5186 | 0.8333 |
143
+ | 0.0858 | 73.9512 | 758 | 0.5403 | 0.8264 |
144
+ | 0.0616 | 74.9268 | 768 | 0.5104 | 0.8472 |
145
+ | 0.0777 | 76.0 | 779 | 0.5516 | 0.8403 |
146
+ | 0.0668 | 76.9756 | 789 | 0.4918 | 0.8611 |
147
+ | 0.0585 | 77.9512 | 799 | 0.5692 | 0.8403 |
148
+ | 0.0562 | 78.9268 | 809 | 0.5734 | 0.8403 |
149
+ | 0.0653 | 80.0 | 820 | 0.5403 | 0.8264 |
150
+ | 0.0434 | 80.9756 | 830 | 0.5108 | 0.8333 |
151
+ | 0.0483 | 81.9512 | 840 | 0.5699 | 0.8125 |
152
+ | 0.0329 | 82.9268 | 850 | 0.6028 | 0.8056 |
153
+ | 0.0431 | 84.0 | 861 | 0.5230 | 0.8333 |
154
+ | 0.042 | 84.9756 | 871 | 0.5875 | 0.8194 |
155
+ | 0.0449 | 85.9512 | 881 | 0.5180 | 0.8611 |
156
+ | 0.0512 | 86.9268 | 891 | 0.5425 | 0.8194 |
157
+ | 0.0545 | 88.0 | 902 | 0.5690 | 0.8264 |
158
+ | 0.0496 | 88.9756 | 912 | 0.5619 | 0.8611 |
159
+ | 0.0449 | 89.9512 | 922 | 0.5626 | 0.8333 |
160
+ | 0.0405 | 90.9268 | 932 | 0.5267 | 0.8403 |
161
+ | 0.0344 | 92.0 | 943 | 0.5617 | 0.8403 |
162
+ | 0.0421 | 92.9756 | 953 | 0.5400 | 0.8611 |
163
+ | 0.0341 | 93.9512 | 963 | 0.5729 | 0.8333 |
164
+ | 0.0492 | 94.9268 | 973 | 0.5855 | 0.8056 |
165
+ | 0.0374 | 96.0 | 984 | 0.6113 | 0.8125 |
166
+ | 0.0375 | 96.9756 | 994 | 0.5511 | 0.8403 |
167
+ | 0.0373 | 97.9512 | 1004 | 0.4942 | 0.8542 |
168
+ | 0.0447 | 98.9268 | 1014 | 0.5031 | 0.8542 |
169
+ | 0.0519 | 100.0 | 1025 | 0.5349 | 0.8542 |
170
+ | 0.0387 | 100.9756 | 1035 | 0.5511 | 0.8542 |
171
+ | 0.0256 | 101.9512 | 1045 | 0.5319 | 0.8403 |
172
+ | 0.043 | 102.9268 | 1055 | 0.5605 | 0.8264 |
173
+ | 0.029 | 104.0 | 1066 | 0.5776 | 0.8403 |
174
+ | 0.0379 | 104.9756 | 1076 | 0.5697 | 0.8472 |
175
+ | 0.0445 | 105.9512 | 1086 | 0.5133 | 0.8681 |
176
+ | 0.0267 | 106.9268 | 1096 | 0.5076 | 0.8681 |
177
+ | 0.044 | 108.0 | 1107 | 0.5260 | 0.8403 |
178
+ | 0.0263 | 108.9756 | 1117 | 0.5101 | 0.8542 |
179
+ | 0.0247 | 109.9512 | 1127 | 0.4972 | 0.8542 |
180
+ | 0.0441 | 110.9268 | 1137 | 0.5094 | 0.8472 |
181
+ | 0.0263 | 112.0 | 1148 | 0.5259 | 0.8333 |
182
+ | 0.0247 | 112.9756 | 1158 | 0.5323 | 0.8403 |
183
+ | 0.0356 | 113.9512 | 1168 | 0.5275 | 0.8403 |
184
+ | 0.0297 | 114.9268 | 1178 | 0.5240 | 0.8333 |
185
+ | 0.044 | 116.0 | 1189 | 0.5201 | 0.8472 |
186
+ | 0.031 | 116.9756 | 1199 | 0.5203 | 0.8403 |
187
+ | 0.0369 | 117.0732 | 1200 | 0.5203 | 0.8403 |
188
+
189
+
190
+ ### Framework versions
191
+
192
+ - Transformers 4.44.0
193
+ - Pytorch 2.4.0
194
+ - Datasets 2.21.0
195
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 117.07317073170732,
3
+ "eval_accuracy": 0.8402777777777778,
4
+ "eval_loss": 0.5202847123146057,
5
+ "eval_runtime": 3.1405,
6
+ "eval_samples_per_second": 45.852,
7
+ "eval_steps_per_second": 1.592,
8
+ "total_flos": 3.819974210196996e+18,
9
+ "train_loss": 0.3624983422954877,
10
+ "train_runtime": 4158.3882,
11
+ "train_samples_per_second": 37.399,
12
+ "train_steps_per_second": 0.289
13
+ }
config.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/convnextv2-tiny-1k-224",
3
+ "architectures": [
4
+ "ConvNextV2ForImageClassification"
5
+ ],
6
+ "depths": [
7
+ 3,
8
+ 3,
9
+ 9,
10
+ 3
11
+ ],
12
+ "drop_path_rate": 0.0,
13
+ "hidden_act": "gelu",
14
+ "hidden_sizes": [
15
+ 96,
16
+ 192,
17
+ 384,
18
+ 768
19
+ ],
20
+ "id2label": {
21
+ "0": "Co_ords",
22
+ "1": "Kaftan",
23
+ "2": "anarkali",
24
+ "3": "cloaks_abaya",
25
+ "4": "dress",
26
+ "5": "dungaree",
27
+ "6": "ethnic",
28
+ "7": "gown",
29
+ "8": "jumpsuit",
30
+ "9": "robe",
31
+ "10": "salwar_suit",
32
+ "11": "saree"
33
+ },
34
+ "image_size": 224,
35
+ "initializer_range": 0.02,
36
+ "label2id": {
37
+ "Co_ords": 0,
38
+ "Kaftan": 1,
39
+ "anarkali": 2,
40
+ "cloaks_abaya": 3,
41
+ "dress": 4,
42
+ "dungaree": 5,
43
+ "ethnic": 6,
44
+ "gown": 7,
45
+ "jumpsuit": 8,
46
+ "robe": 9,
47
+ "salwar_suit": 10,
48
+ "saree": 11
49
+ },
50
+ "layer_norm_eps": 1e-12,
51
+ "model_type": "convnextv2",
52
+ "num_channels": 3,
53
+ "num_stages": 4,
54
+ "out_features": [
55
+ "stage4"
56
+ ],
57
+ "out_indices": [
58
+ 4
59
+ ],
60
+ "patch_size": 4,
61
+ "problem_type": "single_label_classification",
62
+ "stage_names": [
63
+ "stem",
64
+ "stage1",
65
+ "stage2",
66
+ "stage3",
67
+ "stage4"
68
+ ],
69
+ "torch_dtype": "float32",
70
+ "transformers_version": "4.44.0"
71
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 117.07317073170732,
3
+ "eval_accuracy": 0.8402777777777778,
4
+ "eval_loss": 0.5202847123146057,
5
+ "eval_runtime": 3.1405,
6
+ "eval_samples_per_second": 45.852,
7
+ "eval_steps_per_second": 1.592
8
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8800c431490f289e457718c9f52dffebdccfedd69f45c4922365a8a84a3b5788
3
+ size 111526592
preprocessor_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_pct": 0.875,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.485,
8
+ 0.456,
9
+ 0.406
10
+ ],
11
+ "image_processor_type": "ConvNextImageProcessor",
12
+ "image_std": [
13
+ 0.229,
14
+ 0.224,
15
+ 0.225
16
+ ],
17
+ "resample": 3,
18
+ "rescale_factor": 0.00392156862745098,
19
+ "size": {
20
+ "shortest_edge": 224
21
+ }
22
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 117.07317073170732,
3
+ "total_flos": 3.819974210196996e+18,
4
+ "train_loss": 0.3624983422954877,
5
+ "train_runtime": 4158.3882,
6
+ "train_samples_per_second": 37.399,
7
+ "train_steps_per_second": 0.289
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1944 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 117.07317073170732,
5
+ "eval_steps": 500,
6
+ "global_step": 1200,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.975609756097561,
13
+ "grad_norm": 3.532809257507324,
14
+ "learning_rate": 4.166666666666667e-06,
15
+ "loss": 2.4871,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.975609756097561,
20
+ "eval_accuracy": 0.06944444444444445,
21
+ "eval_loss": 2.4770686626434326,
22
+ "eval_runtime": 3.5274,
23
+ "eval_samples_per_second": 40.824,
24
+ "eval_steps_per_second": 1.417,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 1.951219512195122,
29
+ "grad_norm": 5.7173171043396,
30
+ "learning_rate": 8.333333333333334e-06,
31
+ "loss": 2.4464,
32
+ "step": 20
33
+ },
34
+ {
35
+ "epoch": 1.951219512195122,
36
+ "eval_accuracy": 0.1527777777777778,
37
+ "eval_loss": 2.4332642555236816,
38
+ "eval_runtime": 2.9731,
39
+ "eval_samples_per_second": 48.435,
40
+ "eval_steps_per_second": 1.682,
41
+ "step": 20
42
+ },
43
+ {
44
+ "epoch": 2.926829268292683,
45
+ "grad_norm": 6.289983749389648,
46
+ "learning_rate": 1.25e-05,
47
+ "loss": 2.3911,
48
+ "step": 30
49
+ },
50
+ {
51
+ "epoch": 2.926829268292683,
52
+ "eval_accuracy": 0.2777777777777778,
53
+ "eval_loss": 2.3669984340667725,
54
+ "eval_runtime": 2.9436,
55
+ "eval_samples_per_second": 48.919,
56
+ "eval_steps_per_second": 1.699,
57
+ "step": 30
58
+ },
59
+ {
60
+ "epoch": 3.902439024390244,
61
+ "grad_norm": 12.364039421081543,
62
+ "learning_rate": 1.6666666666666667e-05,
63
+ "loss": 2.3204,
64
+ "step": 40
65
+ },
66
+ {
67
+ "epoch": 4.0,
68
+ "eval_accuracy": 0.3680555555555556,
69
+ "eval_loss": 2.261659860610962,
70
+ "eval_runtime": 2.9933,
71
+ "eval_samples_per_second": 48.107,
72
+ "eval_steps_per_second": 1.67,
73
+ "step": 41
74
+ },
75
+ {
76
+ "epoch": 4.878048780487805,
77
+ "grad_norm": 14.902889251708984,
78
+ "learning_rate": 2.0833333333333336e-05,
79
+ "loss": 2.206,
80
+ "step": 50
81
+ },
82
+ {
83
+ "epoch": 4.975609756097561,
84
+ "eval_accuracy": 0.3958333333333333,
85
+ "eval_loss": 2.144454002380371,
86
+ "eval_runtime": 3.011,
87
+ "eval_samples_per_second": 47.824,
88
+ "eval_steps_per_second": 1.661,
89
+ "step": 51
90
+ },
91
+ {
92
+ "epoch": 5.853658536585366,
93
+ "grad_norm": 17.006898880004883,
94
+ "learning_rate": 2.5e-05,
95
+ "loss": 2.0869,
96
+ "step": 60
97
+ },
98
+ {
99
+ "epoch": 5.951219512195122,
100
+ "eval_accuracy": 0.4444444444444444,
101
+ "eval_loss": 2.0146427154541016,
102
+ "eval_runtime": 3.0417,
103
+ "eval_samples_per_second": 47.342,
104
+ "eval_steps_per_second": 1.644,
105
+ "step": 61
106
+ },
107
+ {
108
+ "epoch": 6.829268292682927,
109
+ "grad_norm": 42.38038635253906,
110
+ "learning_rate": 2.916666666666667e-05,
111
+ "loss": 1.9756,
112
+ "step": 70
113
+ },
114
+ {
115
+ "epoch": 6.926829268292683,
116
+ "eval_accuracy": 0.5138888888888888,
117
+ "eval_loss": 1.8763340711593628,
118
+ "eval_runtime": 3.096,
119
+ "eval_samples_per_second": 46.512,
120
+ "eval_steps_per_second": 1.615,
121
+ "step": 71
122
+ },
123
+ {
124
+ "epoch": 7.804878048780488,
125
+ "grad_norm": 26.403841018676758,
126
+ "learning_rate": 3.3333333333333335e-05,
127
+ "loss": 1.8124,
128
+ "step": 80
129
+ },
130
+ {
131
+ "epoch": 8.0,
132
+ "eval_accuracy": 0.5486111111111112,
133
+ "eval_loss": 1.7422140836715698,
134
+ "eval_runtime": 3.1203,
135
+ "eval_samples_per_second": 46.15,
136
+ "eval_steps_per_second": 1.602,
137
+ "step": 82
138
+ },
139
+ {
140
+ "epoch": 8.78048780487805,
141
+ "grad_norm": 38.676876068115234,
142
+ "learning_rate": 3.7500000000000003e-05,
143
+ "loss": 1.6624,
144
+ "step": 90
145
+ },
146
+ {
147
+ "epoch": 8.975609756097562,
148
+ "eval_accuracy": 0.5902777777777778,
149
+ "eval_loss": 1.6628881692886353,
150
+ "eval_runtime": 3.1378,
151
+ "eval_samples_per_second": 45.893,
152
+ "eval_steps_per_second": 1.593,
153
+ "step": 92
154
+ },
155
+ {
156
+ "epoch": 9.75609756097561,
157
+ "grad_norm": 25.14325523376465,
158
+ "learning_rate": 4.166666666666667e-05,
159
+ "loss": 1.587,
160
+ "step": 100
161
+ },
162
+ {
163
+ "epoch": 9.951219512195122,
164
+ "eval_accuracy": 0.6111111111111112,
165
+ "eval_loss": 1.547376036643982,
166
+ "eval_runtime": 3.0786,
167
+ "eval_samples_per_second": 46.774,
168
+ "eval_steps_per_second": 1.624,
169
+ "step": 102
170
+ },
171
+ {
172
+ "epoch": 10.731707317073171,
173
+ "grad_norm": 59.02862548828125,
174
+ "learning_rate": 4.5833333333333334e-05,
175
+ "loss": 1.4746,
176
+ "step": 110
177
+ },
178
+ {
179
+ "epoch": 10.926829268292684,
180
+ "eval_accuracy": 0.625,
181
+ "eval_loss": 1.4577171802520752,
182
+ "eval_runtime": 3.0847,
183
+ "eval_samples_per_second": 46.683,
184
+ "eval_steps_per_second": 1.621,
185
+ "step": 112
186
+ },
187
+ {
188
+ "epoch": 11.707317073170731,
189
+ "grad_norm": 39.518768310546875,
190
+ "learning_rate": 5e-05,
191
+ "loss": 1.359,
192
+ "step": 120
193
+ },
194
+ {
195
+ "epoch": 12.0,
196
+ "eval_accuracy": 0.6736111111111112,
197
+ "eval_loss": 1.30553138256073,
198
+ "eval_runtime": 3.1033,
199
+ "eval_samples_per_second": 46.403,
200
+ "eval_steps_per_second": 1.611,
201
+ "step": 123
202
+ },
203
+ {
204
+ "epoch": 12.682926829268293,
205
+ "grad_norm": 41.855445861816406,
206
+ "learning_rate": 4.9537037037037035e-05,
207
+ "loss": 1.2412,
208
+ "step": 130
209
+ },
210
+ {
211
+ "epoch": 12.975609756097562,
212
+ "eval_accuracy": 0.6736111111111112,
213
+ "eval_loss": 1.2240564823150635,
214
+ "eval_runtime": 3.1092,
215
+ "eval_samples_per_second": 46.314,
216
+ "eval_steps_per_second": 1.608,
217
+ "step": 133
218
+ },
219
+ {
220
+ "epoch": 13.658536585365853,
221
+ "grad_norm": 14.434544563293457,
222
+ "learning_rate": 4.9074074074074075e-05,
223
+ "loss": 1.1374,
224
+ "step": 140
225
+ },
226
+ {
227
+ "epoch": 13.951219512195122,
228
+ "eval_accuracy": 0.6736111111111112,
229
+ "eval_loss": 1.2003458738327026,
230
+ "eval_runtime": 3.1022,
231
+ "eval_samples_per_second": 46.419,
232
+ "eval_steps_per_second": 1.612,
233
+ "step": 143
234
+ },
235
+ {
236
+ "epoch": 14.634146341463415,
237
+ "grad_norm": 51.862064361572266,
238
+ "learning_rate": 4.8611111111111115e-05,
239
+ "loss": 1.0194,
240
+ "step": 150
241
+ },
242
+ {
243
+ "epoch": 14.926829268292684,
244
+ "eval_accuracy": 0.7569444444444444,
245
+ "eval_loss": 1.0233235359191895,
246
+ "eval_runtime": 3.1109,
247
+ "eval_samples_per_second": 46.288,
248
+ "eval_steps_per_second": 1.607,
249
+ "step": 153
250
+ },
251
+ {
252
+ "epoch": 15.609756097560975,
253
+ "grad_norm": 59.053375244140625,
254
+ "learning_rate": 4.814814814814815e-05,
255
+ "loss": 0.9705,
256
+ "step": 160
257
+ },
258
+ {
259
+ "epoch": 16.0,
260
+ "eval_accuracy": 0.7847222222222222,
261
+ "eval_loss": 0.9492360353469849,
262
+ "eval_runtime": 3.0962,
263
+ "eval_samples_per_second": 46.509,
264
+ "eval_steps_per_second": 1.615,
265
+ "step": 164
266
+ },
267
+ {
268
+ "epoch": 16.585365853658537,
269
+ "grad_norm": 23.618066787719727,
270
+ "learning_rate": 4.768518518518519e-05,
271
+ "loss": 0.8949,
272
+ "step": 170
273
+ },
274
+ {
275
+ "epoch": 16.975609756097562,
276
+ "eval_accuracy": 0.75,
277
+ "eval_loss": 0.9246302843093872,
278
+ "eval_runtime": 3.3666,
279
+ "eval_samples_per_second": 42.773,
280
+ "eval_steps_per_second": 1.485,
281
+ "step": 174
282
+ },
283
+ {
284
+ "epoch": 17.5609756097561,
285
+ "grad_norm": 37.61671447753906,
286
+ "learning_rate": 4.722222222222222e-05,
287
+ "loss": 0.7959,
288
+ "step": 180
289
+ },
290
+ {
291
+ "epoch": 17.951219512195124,
292
+ "eval_accuracy": 0.7638888888888888,
293
+ "eval_loss": 0.8147740960121155,
294
+ "eval_runtime": 3.1586,
295
+ "eval_samples_per_second": 45.589,
296
+ "eval_steps_per_second": 1.583,
297
+ "step": 184
298
+ },
299
+ {
300
+ "epoch": 18.536585365853657,
301
+ "grad_norm": 22.54252052307129,
302
+ "learning_rate": 4.675925925925926e-05,
303
+ "loss": 0.7491,
304
+ "step": 190
305
+ },
306
+ {
307
+ "epoch": 18.926829268292682,
308
+ "eval_accuracy": 0.7569444444444444,
309
+ "eval_loss": 0.785780668258667,
310
+ "eval_runtime": 3.1126,
311
+ "eval_samples_per_second": 46.264,
312
+ "eval_steps_per_second": 1.606,
313
+ "step": 194
314
+ },
315
+ {
316
+ "epoch": 19.51219512195122,
317
+ "grad_norm": 73.11436462402344,
318
+ "learning_rate": 4.62962962962963e-05,
319
+ "loss": 0.6783,
320
+ "step": 200
321
+ },
322
+ {
323
+ "epoch": 20.0,
324
+ "eval_accuracy": 0.7569444444444444,
325
+ "eval_loss": 0.8010105490684509,
326
+ "eval_runtime": 3.0934,
327
+ "eval_samples_per_second": 46.55,
328
+ "eval_steps_per_second": 1.616,
329
+ "step": 205
330
+ },
331
+ {
332
+ "epoch": 20.48780487804878,
333
+ "grad_norm": 25.613719940185547,
334
+ "learning_rate": 4.5833333333333334e-05,
335
+ "loss": 0.6257,
336
+ "step": 210
337
+ },
338
+ {
339
+ "epoch": 20.975609756097562,
340
+ "eval_accuracy": 0.7847222222222222,
341
+ "eval_loss": 0.7294739484786987,
342
+ "eval_runtime": 3.0855,
343
+ "eval_samples_per_second": 46.67,
344
+ "eval_steps_per_second": 1.62,
345
+ "step": 215
346
+ },
347
+ {
348
+ "epoch": 21.463414634146343,
349
+ "grad_norm": 21.02585220336914,
350
+ "learning_rate": 4.5370370370370374e-05,
351
+ "loss": 0.5999,
352
+ "step": 220
353
+ },
354
+ {
355
+ "epoch": 21.951219512195124,
356
+ "eval_accuracy": 0.8333333333333334,
357
+ "eval_loss": 0.6218506097793579,
358
+ "eval_runtime": 3.0821,
359
+ "eval_samples_per_second": 46.722,
360
+ "eval_steps_per_second": 1.622,
361
+ "step": 225
362
+ },
363
+ {
364
+ "epoch": 22.4390243902439,
365
+ "grad_norm": 24.335525512695312,
366
+ "learning_rate": 4.490740740740741e-05,
367
+ "loss": 0.5701,
368
+ "step": 230
369
+ },
370
+ {
371
+ "epoch": 22.926829268292682,
372
+ "eval_accuracy": 0.8402777777777778,
373
+ "eval_loss": 0.5932477116584778,
374
+ "eval_runtime": 3.1062,
375
+ "eval_samples_per_second": 46.359,
376
+ "eval_steps_per_second": 1.61,
377
+ "step": 235
378
+ },
379
+ {
380
+ "epoch": 23.414634146341463,
381
+ "grad_norm": 25.651668548583984,
382
+ "learning_rate": 4.4444444444444447e-05,
383
+ "loss": 0.4926,
384
+ "step": 240
385
+ },
386
+ {
387
+ "epoch": 24.0,
388
+ "eval_accuracy": 0.8055555555555556,
389
+ "eval_loss": 0.5970398783683777,
390
+ "eval_runtime": 3.1225,
391
+ "eval_samples_per_second": 46.116,
392
+ "eval_steps_per_second": 1.601,
393
+ "step": 246
394
+ },
395
+ {
396
+ "epoch": 24.390243902439025,
397
+ "grad_norm": 46.76804733276367,
398
+ "learning_rate": 4.3981481481481486e-05,
399
+ "loss": 0.4692,
400
+ "step": 250
401
+ },
402
+ {
403
+ "epoch": 24.975609756097562,
404
+ "eval_accuracy": 0.8194444444444444,
405
+ "eval_loss": 0.6297520995140076,
406
+ "eval_runtime": 3.3496,
407
+ "eval_samples_per_second": 42.99,
408
+ "eval_steps_per_second": 1.493,
409
+ "step": 256
410
+ },
411
+ {
412
+ "epoch": 25.365853658536587,
413
+ "grad_norm": 28.9127140045166,
414
+ "learning_rate": 4.351851851851852e-05,
415
+ "loss": 0.4393,
416
+ "step": 260
417
+ },
418
+ {
419
+ "epoch": 25.951219512195124,
420
+ "eval_accuracy": 0.8055555555555556,
421
+ "eval_loss": 0.5856587886810303,
422
+ "eval_runtime": 3.1545,
423
+ "eval_samples_per_second": 45.648,
424
+ "eval_steps_per_second": 1.585,
425
+ "step": 266
426
+ },
427
+ {
428
+ "epoch": 26.341463414634145,
429
+ "grad_norm": 18.964569091796875,
430
+ "learning_rate": 4.305555555555556e-05,
431
+ "loss": 0.419,
432
+ "step": 270
433
+ },
434
+ {
435
+ "epoch": 26.926829268292682,
436
+ "eval_accuracy": 0.8541666666666666,
437
+ "eval_loss": 0.5202641487121582,
438
+ "eval_runtime": 3.0877,
439
+ "eval_samples_per_second": 46.637,
440
+ "eval_steps_per_second": 1.619,
441
+ "step": 276
442
+ },
443
+ {
444
+ "epoch": 27.317073170731707,
445
+ "grad_norm": 27.72169303894043,
446
+ "learning_rate": 4.259259259259259e-05,
447
+ "loss": 0.3454,
448
+ "step": 280
449
+ },
450
+ {
451
+ "epoch": 28.0,
452
+ "eval_accuracy": 0.8263888888888888,
453
+ "eval_loss": 0.6083860397338867,
454
+ "eval_runtime": 3.1332,
455
+ "eval_samples_per_second": 45.959,
456
+ "eval_steps_per_second": 1.596,
457
+ "step": 287
458
+ },
459
+ {
460
+ "epoch": 28.29268292682927,
461
+ "grad_norm": 34.03929138183594,
462
+ "learning_rate": 4.212962962962963e-05,
463
+ "loss": 0.36,
464
+ "step": 290
465
+ },
466
+ {
467
+ "epoch": 28.975609756097562,
468
+ "eval_accuracy": 0.8263888888888888,
469
+ "eval_loss": 0.5927634835243225,
470
+ "eval_runtime": 3.105,
471
+ "eval_samples_per_second": 46.377,
472
+ "eval_steps_per_second": 1.61,
473
+ "step": 297
474
+ },
475
+ {
476
+ "epoch": 29.26829268292683,
477
+ "grad_norm": 12.895988464355469,
478
+ "learning_rate": 4.166666666666667e-05,
479
+ "loss": 0.3265,
480
+ "step": 300
481
+ },
482
+ {
483
+ "epoch": 29.951219512195124,
484
+ "eval_accuracy": 0.8402777777777778,
485
+ "eval_loss": 0.5302631855010986,
486
+ "eval_runtime": 3.1155,
487
+ "eval_samples_per_second": 46.22,
488
+ "eval_steps_per_second": 1.605,
489
+ "step": 307
490
+ },
491
+ {
492
+ "epoch": 30.24390243902439,
493
+ "grad_norm": 25.50472640991211,
494
+ "learning_rate": 4.1203703703703705e-05,
495
+ "loss": 0.3278,
496
+ "step": 310
497
+ },
498
+ {
499
+ "epoch": 30.926829268292682,
500
+ "eval_accuracy": 0.8194444444444444,
501
+ "eval_loss": 0.6049231886863708,
502
+ "eval_runtime": 3.093,
503
+ "eval_samples_per_second": 46.557,
504
+ "eval_steps_per_second": 1.617,
505
+ "step": 317
506
+ },
507
+ {
508
+ "epoch": 31.21951219512195,
509
+ "grad_norm": 24.565650939941406,
510
+ "learning_rate": 4.074074074074074e-05,
511
+ "loss": 0.2766,
512
+ "step": 320
513
+ },
514
+ {
515
+ "epoch": 32.0,
516
+ "eval_accuracy": 0.8263888888888888,
517
+ "eval_loss": 0.5656167268753052,
518
+ "eval_runtime": 3.1109,
519
+ "eval_samples_per_second": 46.289,
520
+ "eval_steps_per_second": 1.607,
521
+ "step": 328
522
+ },
523
+ {
524
+ "epoch": 32.19512195121951,
525
+ "grad_norm": 55.43843460083008,
526
+ "learning_rate": 4.027777777777778e-05,
527
+ "loss": 0.2805,
528
+ "step": 330
529
+ },
530
+ {
531
+ "epoch": 32.97560975609756,
532
+ "eval_accuracy": 0.8680555555555556,
533
+ "eval_loss": 0.500336229801178,
534
+ "eval_runtime": 3.0516,
535
+ "eval_samples_per_second": 47.188,
536
+ "eval_steps_per_second": 1.638,
537
+ "step": 338
538
+ },
539
+ {
540
+ "epoch": 33.170731707317074,
541
+ "grad_norm": 21.027912139892578,
542
+ "learning_rate": 3.981481481481482e-05,
543
+ "loss": 0.2505,
544
+ "step": 340
545
+ },
546
+ {
547
+ "epoch": 33.951219512195124,
548
+ "eval_accuracy": 0.8402777777777778,
549
+ "eval_loss": 0.5412003397941589,
550
+ "eval_runtime": 2.9972,
551
+ "eval_samples_per_second": 48.044,
552
+ "eval_steps_per_second": 1.668,
553
+ "step": 348
554
+ },
555
+ {
556
+ "epoch": 34.146341463414636,
557
+ "grad_norm": 15.519696235656738,
558
+ "learning_rate": 3.935185185185186e-05,
559
+ "loss": 0.2464,
560
+ "step": 350
561
+ },
562
+ {
563
+ "epoch": 34.926829268292686,
564
+ "eval_accuracy": 0.8333333333333334,
565
+ "eval_loss": 0.5409752130508423,
566
+ "eval_runtime": 3.024,
567
+ "eval_samples_per_second": 47.62,
568
+ "eval_steps_per_second": 1.653,
569
+ "step": 358
570
+ },
571
+ {
572
+ "epoch": 35.1219512195122,
573
+ "grad_norm": 24.68006134033203,
574
+ "learning_rate": 3.888888888888889e-05,
575
+ "loss": 0.2166,
576
+ "step": 360
577
+ },
578
+ {
579
+ "epoch": 36.0,
580
+ "eval_accuracy": 0.8472222222222222,
581
+ "eval_loss": 0.499971866607666,
582
+ "eval_runtime": 2.9993,
583
+ "eval_samples_per_second": 48.011,
584
+ "eval_steps_per_second": 1.667,
585
+ "step": 369
586
+ },
587
+ {
588
+ "epoch": 36.09756097560975,
589
+ "grad_norm": 16.784454345703125,
590
+ "learning_rate": 3.8425925925925924e-05,
591
+ "loss": 0.2,
592
+ "step": 370
593
+ },
594
+ {
595
+ "epoch": 36.97560975609756,
596
+ "eval_accuracy": 0.8055555555555556,
597
+ "eval_loss": 0.5053013563156128,
598
+ "eval_runtime": 2.9731,
599
+ "eval_samples_per_second": 48.435,
600
+ "eval_steps_per_second": 1.682,
601
+ "step": 379
602
+ },
603
+ {
604
+ "epoch": 37.073170731707314,
605
+ "grad_norm": 16.091339111328125,
606
+ "learning_rate": 3.7962962962962964e-05,
607
+ "loss": 0.1914,
608
+ "step": 380
609
+ },
610
+ {
611
+ "epoch": 37.951219512195124,
612
+ "eval_accuracy": 0.8402777777777778,
613
+ "eval_loss": 0.5161268711090088,
614
+ "eval_runtime": 3.0408,
615
+ "eval_samples_per_second": 47.356,
616
+ "eval_steps_per_second": 1.644,
617
+ "step": 389
618
+ },
619
+ {
620
+ "epoch": 38.048780487804876,
621
+ "grad_norm": 19.708166122436523,
622
+ "learning_rate": 3.7500000000000003e-05,
623
+ "loss": 0.186,
624
+ "step": 390
625
+ },
626
+ {
627
+ "epoch": 38.926829268292686,
628
+ "eval_accuracy": 0.8680555555555556,
629
+ "eval_loss": 0.42421483993530273,
630
+ "eval_runtime": 3.0128,
631
+ "eval_samples_per_second": 47.796,
632
+ "eval_steps_per_second": 1.66,
633
+ "step": 399
634
+ },
635
+ {
636
+ "epoch": 39.02439024390244,
637
+ "grad_norm": 18.210954666137695,
638
+ "learning_rate": 3.7037037037037037e-05,
639
+ "loss": 0.1767,
640
+ "step": 400
641
+ },
642
+ {
643
+ "epoch": 40.0,
644
+ "grad_norm": 12.815789222717285,
645
+ "learning_rate": 3.6574074074074076e-05,
646
+ "loss": 0.1592,
647
+ "step": 410
648
+ },
649
+ {
650
+ "epoch": 40.0,
651
+ "eval_accuracy": 0.8472222222222222,
652
+ "eval_loss": 0.5058771967887878,
653
+ "eval_runtime": 3.0381,
654
+ "eval_samples_per_second": 47.399,
655
+ "eval_steps_per_second": 1.646,
656
+ "step": 410
657
+ },
658
+ {
659
+ "epoch": 40.97560975609756,
660
+ "grad_norm": 17.878015518188477,
661
+ "learning_rate": 3.611111111111111e-05,
662
+ "loss": 0.1598,
663
+ "step": 420
664
+ },
665
+ {
666
+ "epoch": 40.97560975609756,
667
+ "eval_accuracy": 0.8263888888888888,
668
+ "eval_loss": 0.5143249034881592,
669
+ "eval_runtime": 3.2074,
670
+ "eval_samples_per_second": 44.896,
671
+ "eval_steps_per_second": 1.559,
672
+ "step": 420
673
+ },
674
+ {
675
+ "epoch": 41.951219512195124,
676
+ "grad_norm": 18.464937210083008,
677
+ "learning_rate": 3.564814814814815e-05,
678
+ "loss": 0.1565,
679
+ "step": 430
680
+ },
681
+ {
682
+ "epoch": 41.951219512195124,
683
+ "eval_accuracy": 0.8541666666666666,
684
+ "eval_loss": 0.47032684087753296,
685
+ "eval_runtime": 3.4622,
686
+ "eval_samples_per_second": 41.592,
687
+ "eval_steps_per_second": 1.444,
688
+ "step": 430
689
+ },
690
+ {
691
+ "epoch": 42.926829268292686,
692
+ "grad_norm": 21.608076095581055,
693
+ "learning_rate": 3.518518518518519e-05,
694
+ "loss": 0.1598,
695
+ "step": 440
696
+ },
697
+ {
698
+ "epoch": 42.926829268292686,
699
+ "eval_accuracy": 0.8541666666666666,
700
+ "eval_loss": 0.4383782148361206,
701
+ "eval_runtime": 3.2144,
702
+ "eval_samples_per_second": 44.798,
703
+ "eval_steps_per_second": 1.555,
704
+ "step": 440
705
+ },
706
+ {
707
+ "epoch": 43.90243902439025,
708
+ "grad_norm": 16.986896514892578,
709
+ "learning_rate": 3.472222222222222e-05,
710
+ "loss": 0.139,
711
+ "step": 450
712
+ },
713
+ {
714
+ "epoch": 44.0,
715
+ "eval_accuracy": 0.8402777777777778,
716
+ "eval_loss": 0.48497942090034485,
717
+ "eval_runtime": 3.1916,
718
+ "eval_samples_per_second": 45.118,
719
+ "eval_steps_per_second": 1.567,
720
+ "step": 451
721
+ },
722
+ {
723
+ "epoch": 44.8780487804878,
724
+ "grad_norm": 10.322036743164062,
725
+ "learning_rate": 3.425925925925926e-05,
726
+ "loss": 0.1137,
727
+ "step": 460
728
+ },
729
+ {
730
+ "epoch": 44.97560975609756,
731
+ "eval_accuracy": 0.8541666666666666,
732
+ "eval_loss": 0.4405103623867035,
733
+ "eval_runtime": 3.1456,
734
+ "eval_samples_per_second": 45.778,
735
+ "eval_steps_per_second": 1.59,
736
+ "step": 461
737
+ },
738
+ {
739
+ "epoch": 45.853658536585364,
740
+ "grad_norm": 24.03445053100586,
741
+ "learning_rate": 3.3796296296296295e-05,
742
+ "loss": 0.1158,
743
+ "step": 470
744
+ },
745
+ {
746
+ "epoch": 45.951219512195124,
747
+ "eval_accuracy": 0.8333333333333334,
748
+ "eval_loss": 0.5250218510627747,
749
+ "eval_runtime": 3.2289,
750
+ "eval_samples_per_second": 44.597,
751
+ "eval_steps_per_second": 1.549,
752
+ "step": 471
753
+ },
754
+ {
755
+ "epoch": 46.829268292682926,
756
+ "grad_norm": 19.822120666503906,
757
+ "learning_rate": 3.3333333333333335e-05,
758
+ "loss": 0.1192,
759
+ "step": 480
760
+ },
761
+ {
762
+ "epoch": 46.926829268292686,
763
+ "eval_accuracy": 0.8194444444444444,
764
+ "eval_loss": 0.5843467116355896,
765
+ "eval_runtime": 3.1862,
766
+ "eval_samples_per_second": 45.195,
767
+ "eval_steps_per_second": 1.569,
768
+ "step": 481
769
+ },
770
+ {
771
+ "epoch": 47.80487804878049,
772
+ "grad_norm": 14.12452220916748,
773
+ "learning_rate": 3.2870370370370375e-05,
774
+ "loss": 0.1271,
775
+ "step": 490
776
+ },
777
+ {
778
+ "epoch": 48.0,
779
+ "eval_accuracy": 0.8611111111111112,
780
+ "eval_loss": 0.44981643557548523,
781
+ "eval_runtime": 3.0999,
782
+ "eval_samples_per_second": 46.453,
783
+ "eval_steps_per_second": 1.613,
784
+ "step": 492
785
+ },
786
+ {
787
+ "epoch": 48.78048780487805,
788
+ "grad_norm": 12.838052749633789,
789
+ "learning_rate": 3.240740740740741e-05,
790
+ "loss": 0.0914,
791
+ "step": 500
792
+ },
793
+ {
794
+ "epoch": 48.97560975609756,
795
+ "eval_accuracy": 0.8263888888888888,
796
+ "eval_loss": 0.5166668891906738,
797
+ "eval_runtime": 3.2195,
798
+ "eval_samples_per_second": 44.728,
799
+ "eval_steps_per_second": 1.553,
800
+ "step": 502
801
+ },
802
+ {
803
+ "epoch": 49.75609756097561,
804
+ "grad_norm": 20.458118438720703,
805
+ "learning_rate": 3.194444444444444e-05,
806
+ "loss": 0.1079,
807
+ "step": 510
808
+ },
809
+ {
810
+ "epoch": 49.951219512195124,
811
+ "eval_accuracy": 0.8680555555555556,
812
+ "eval_loss": 0.46484246850013733,
813
+ "eval_runtime": 3.1773,
814
+ "eval_samples_per_second": 45.322,
815
+ "eval_steps_per_second": 1.574,
816
+ "step": 512
817
+ },
818
+ {
819
+ "epoch": 50.73170731707317,
820
+ "grad_norm": 20.558439254760742,
821
+ "learning_rate": 3.148148148148148e-05,
822
+ "loss": 0.091,
823
+ "step": 520
824
+ },
825
+ {
826
+ "epoch": 50.926829268292686,
827
+ "eval_accuracy": 0.8194444444444444,
828
+ "eval_loss": 0.5321457386016846,
829
+ "eval_runtime": 3.1291,
830
+ "eval_samples_per_second": 46.02,
831
+ "eval_steps_per_second": 1.598,
832
+ "step": 522
833
+ },
834
+ {
835
+ "epoch": 51.707317073170735,
836
+ "grad_norm": 11.692161560058594,
837
+ "learning_rate": 3.101851851851852e-05,
838
+ "loss": 0.1053,
839
+ "step": 530
840
+ },
841
+ {
842
+ "epoch": 52.0,
843
+ "eval_accuracy": 0.8611111111111112,
844
+ "eval_loss": 0.4402025043964386,
845
+ "eval_runtime": 3.1245,
846
+ "eval_samples_per_second": 46.087,
847
+ "eval_steps_per_second": 1.6,
848
+ "step": 533
849
+ },
850
+ {
851
+ "epoch": 52.68292682926829,
852
+ "grad_norm": 8.12877082824707,
853
+ "learning_rate": 3.055555555555556e-05,
854
+ "loss": 0.0842,
855
+ "step": 540
856
+ },
857
+ {
858
+ "epoch": 52.97560975609756,
859
+ "eval_accuracy": 0.8541666666666666,
860
+ "eval_loss": 0.477556437253952,
861
+ "eval_runtime": 3.1029,
862
+ "eval_samples_per_second": 46.408,
863
+ "eval_steps_per_second": 1.611,
864
+ "step": 543
865
+ },
866
+ {
867
+ "epoch": 53.65853658536585,
868
+ "grad_norm": 21.6231746673584,
869
+ "learning_rate": 3.0092592592592593e-05,
870
+ "loss": 0.0961,
871
+ "step": 550
872
+ },
873
+ {
874
+ "epoch": 53.951219512195124,
875
+ "eval_accuracy": 0.8680555555555556,
876
+ "eval_loss": 0.4761970341205597,
877
+ "eval_runtime": 3.0897,
878
+ "eval_samples_per_second": 46.607,
879
+ "eval_steps_per_second": 1.618,
880
+ "step": 553
881
+ },
882
+ {
883
+ "epoch": 54.63414634146341,
884
+ "grad_norm": 22.6603946685791,
885
+ "learning_rate": 2.962962962962963e-05,
886
+ "loss": 0.0896,
887
+ "step": 560
888
+ },
889
+ {
890
+ "epoch": 54.926829268292686,
891
+ "eval_accuracy": 0.8680555555555556,
892
+ "eval_loss": 0.4477081894874573,
893
+ "eval_runtime": 3.1158,
894
+ "eval_samples_per_second": 46.216,
895
+ "eval_steps_per_second": 1.605,
896
+ "step": 563
897
+ },
898
+ {
899
+ "epoch": 55.609756097560975,
900
+ "grad_norm": 20.613056182861328,
901
+ "learning_rate": 2.916666666666667e-05,
902
+ "loss": 0.0876,
903
+ "step": 570
904
+ },
905
+ {
906
+ "epoch": 56.0,
907
+ "eval_accuracy": 0.8472222222222222,
908
+ "eval_loss": 0.49506622552871704,
909
+ "eval_runtime": 3.1109,
910
+ "eval_samples_per_second": 46.289,
911
+ "eval_steps_per_second": 1.607,
912
+ "step": 574
913
+ },
914
+ {
915
+ "epoch": 56.58536585365854,
916
+ "grad_norm": 15.108587265014648,
917
+ "learning_rate": 2.8703703703703706e-05,
918
+ "loss": 0.0855,
919
+ "step": 580
920
+ },
921
+ {
922
+ "epoch": 56.97560975609756,
923
+ "eval_accuracy": 0.8125,
924
+ "eval_loss": 0.565302312374115,
925
+ "eval_runtime": 3.3057,
926
+ "eval_samples_per_second": 43.562,
927
+ "eval_steps_per_second": 1.513,
928
+ "step": 584
929
+ },
930
+ {
931
+ "epoch": 57.5609756097561,
932
+ "grad_norm": 14.336268424987793,
933
+ "learning_rate": 2.824074074074074e-05,
934
+ "loss": 0.073,
935
+ "step": 590
936
+ },
937
+ {
938
+ "epoch": 57.951219512195124,
939
+ "eval_accuracy": 0.8472222222222222,
940
+ "eval_loss": 0.5314738750457764,
941
+ "eval_runtime": 3.1166,
942
+ "eval_samples_per_second": 46.205,
943
+ "eval_steps_per_second": 1.604,
944
+ "step": 594
945
+ },
946
+ {
947
+ "epoch": 58.53658536585366,
948
+ "grad_norm": 31.659631729125977,
949
+ "learning_rate": 2.777777777777778e-05,
950
+ "loss": 0.0804,
951
+ "step": 600
952
+ },
953
+ {
954
+ "epoch": 58.926829268292686,
955
+ "eval_accuracy": 0.8680555555555556,
956
+ "eval_loss": 0.5064035058021545,
957
+ "eval_runtime": 3.1077,
958
+ "eval_samples_per_second": 46.336,
959
+ "eval_steps_per_second": 1.609,
960
+ "step": 604
961
+ },
962
+ {
963
+ "epoch": 59.51219512195122,
964
+ "grad_norm": 6.317721366882324,
965
+ "learning_rate": 2.7314814814814816e-05,
966
+ "loss": 0.0765,
967
+ "step": 610
968
+ },
969
+ {
970
+ "epoch": 60.0,
971
+ "eval_accuracy": 0.8263888888888888,
972
+ "eval_loss": 0.63160640001297,
973
+ "eval_runtime": 3.0861,
974
+ "eval_samples_per_second": 46.661,
975
+ "eval_steps_per_second": 1.62,
976
+ "step": 615
977
+ },
978
+ {
979
+ "epoch": 60.48780487804878,
980
+ "grad_norm": 41.102962493896484,
981
+ "learning_rate": 2.6851851851851855e-05,
982
+ "loss": 0.0782,
983
+ "step": 620
984
+ },
985
+ {
986
+ "epoch": 60.97560975609756,
987
+ "eval_accuracy": 0.8055555555555556,
988
+ "eval_loss": 0.5733475089073181,
989
+ "eval_runtime": 3.1325,
990
+ "eval_samples_per_second": 45.97,
991
+ "eval_steps_per_second": 1.596,
992
+ "step": 625
993
+ },
994
+ {
995
+ "epoch": 61.46341463414634,
996
+ "grad_norm": 6.797872066497803,
997
+ "learning_rate": 2.6388888888888892e-05,
998
+ "loss": 0.069,
999
+ "step": 630
1000
+ },
1001
+ {
1002
+ "epoch": 61.951219512195124,
1003
+ "eval_accuracy": 0.8055555555555556,
1004
+ "eval_loss": 0.6994370222091675,
1005
+ "eval_runtime": 3.1742,
1006
+ "eval_samples_per_second": 45.365,
1007
+ "eval_steps_per_second": 1.575,
1008
+ "step": 635
1009
+ },
1010
+ {
1011
+ "epoch": 62.4390243902439,
1012
+ "grad_norm": 9.439558029174805,
1013
+ "learning_rate": 2.5925925925925925e-05,
1014
+ "loss": 0.0809,
1015
+ "step": 640
1016
+ },
1017
+ {
1018
+ "epoch": 62.926829268292686,
1019
+ "eval_accuracy": 0.8611111111111112,
1020
+ "eval_loss": 0.48975637555122375,
1021
+ "eval_runtime": 3.1035,
1022
+ "eval_samples_per_second": 46.4,
1023
+ "eval_steps_per_second": 1.611,
1024
+ "step": 645
1025
+ },
1026
+ {
1027
+ "epoch": 63.41463414634146,
1028
+ "grad_norm": 23.557085037231445,
1029
+ "learning_rate": 2.5462962962962965e-05,
1030
+ "loss": 0.0829,
1031
+ "step": 650
1032
+ },
1033
+ {
1034
+ "epoch": 64.0,
1035
+ "eval_accuracy": 0.8194444444444444,
1036
+ "eval_loss": 0.6042267680168152,
1037
+ "eval_runtime": 3.0922,
1038
+ "eval_samples_per_second": 46.569,
1039
+ "eval_steps_per_second": 1.617,
1040
+ "step": 656
1041
+ },
1042
+ {
1043
+ "epoch": 64.39024390243902,
1044
+ "grad_norm": 5.313930511474609,
1045
+ "learning_rate": 2.5e-05,
1046
+ "loss": 0.0735,
1047
+ "step": 660
1048
+ },
1049
+ {
1050
+ "epoch": 64.97560975609755,
1051
+ "eval_accuracy": 0.8611111111111112,
1052
+ "eval_loss": 0.4758368730545044,
1053
+ "eval_runtime": 3.1221,
1054
+ "eval_samples_per_second": 46.122,
1055
+ "eval_steps_per_second": 1.601,
1056
+ "step": 666
1057
+ },
1058
+ {
1059
+ "epoch": 65.36585365853658,
1060
+ "grad_norm": 21.385704040527344,
1061
+ "learning_rate": 2.4537037037037038e-05,
1062
+ "loss": 0.0763,
1063
+ "step": 670
1064
+ },
1065
+ {
1066
+ "epoch": 65.95121951219512,
1067
+ "eval_accuracy": 0.8541666666666666,
1068
+ "eval_loss": 0.4920533001422882,
1069
+ "eval_runtime": 3.0958,
1070
+ "eval_samples_per_second": 46.514,
1071
+ "eval_steps_per_second": 1.615,
1072
+ "step": 676
1073
+ },
1074
+ {
1075
+ "epoch": 66.34146341463415,
1076
+ "grad_norm": 5.922763347625732,
1077
+ "learning_rate": 2.4074074074074074e-05,
1078
+ "loss": 0.0565,
1079
+ "step": 680
1080
+ },
1081
+ {
1082
+ "epoch": 66.92682926829268,
1083
+ "eval_accuracy": 0.8680555555555556,
1084
+ "eval_loss": 0.47003433108329773,
1085
+ "eval_runtime": 3.0115,
1086
+ "eval_samples_per_second": 47.816,
1087
+ "eval_steps_per_second": 1.66,
1088
+ "step": 686
1089
+ },
1090
+ {
1091
+ "epoch": 67.3170731707317,
1092
+ "grad_norm": 8.859350204467773,
1093
+ "learning_rate": 2.361111111111111e-05,
1094
+ "loss": 0.062,
1095
+ "step": 690
1096
+ },
1097
+ {
1098
+ "epoch": 68.0,
1099
+ "eval_accuracy": 0.8819444444444444,
1100
+ "eval_loss": 0.49443933367729187,
1101
+ "eval_runtime": 3.0558,
1102
+ "eval_samples_per_second": 47.123,
1103
+ "eval_steps_per_second": 1.636,
1104
+ "step": 697
1105
+ },
1106
+ {
1107
+ "epoch": 68.29268292682927,
1108
+ "grad_norm": 11.291411399841309,
1109
+ "learning_rate": 2.314814814814815e-05,
1110
+ "loss": 0.0644,
1111
+ "step": 700
1112
+ },
1113
+ {
1114
+ "epoch": 68.97560975609755,
1115
+ "eval_accuracy": 0.8680555555555556,
1116
+ "eval_loss": 0.47334182262420654,
1117
+ "eval_runtime": 3.1099,
1118
+ "eval_samples_per_second": 46.303,
1119
+ "eval_steps_per_second": 1.608,
1120
+ "step": 707
1121
+ },
1122
+ {
1123
+ "epoch": 69.26829268292683,
1124
+ "grad_norm": 4.428774356842041,
1125
+ "learning_rate": 2.2685185185185187e-05,
1126
+ "loss": 0.0659,
1127
+ "step": 710
1128
+ },
1129
+ {
1130
+ "epoch": 69.95121951219512,
1131
+ "eval_accuracy": 0.8819444444444444,
1132
+ "eval_loss": 0.4702872037887573,
1133
+ "eval_runtime": 3.1595,
1134
+ "eval_samples_per_second": 45.577,
1135
+ "eval_steps_per_second": 1.583,
1136
+ "step": 717
1137
+ },
1138
+ {
1139
+ "epoch": 70.2439024390244,
1140
+ "grad_norm": 20.887792587280273,
1141
+ "learning_rate": 2.2222222222222223e-05,
1142
+ "loss": 0.0625,
1143
+ "step": 720
1144
+ },
1145
+ {
1146
+ "epoch": 70.92682926829268,
1147
+ "eval_accuracy": 0.8541666666666666,
1148
+ "eval_loss": 0.5075345635414124,
1149
+ "eval_runtime": 3.1136,
1150
+ "eval_samples_per_second": 46.248,
1151
+ "eval_steps_per_second": 1.606,
1152
+ "step": 727
1153
+ },
1154
+ {
1155
+ "epoch": 71.21951219512195,
1156
+ "grad_norm": 7.4120917320251465,
1157
+ "learning_rate": 2.175925925925926e-05,
1158
+ "loss": 0.042,
1159
+ "step": 730
1160
+ },
1161
+ {
1162
+ "epoch": 72.0,
1163
+ "eval_accuracy": 0.8263888888888888,
1164
+ "eval_loss": 0.5463792085647583,
1165
+ "eval_runtime": 3.1979,
1166
+ "eval_samples_per_second": 45.029,
1167
+ "eval_steps_per_second": 1.564,
1168
+ "step": 738
1169
+ },
1170
+ {
1171
+ "epoch": 72.1951219512195,
1172
+ "grad_norm": 4.7453532218933105,
1173
+ "learning_rate": 2.1296296296296296e-05,
1174
+ "loss": 0.056,
1175
+ "step": 740
1176
+ },
1177
+ {
1178
+ "epoch": 72.97560975609755,
1179
+ "eval_accuracy": 0.8333333333333334,
1180
+ "eval_loss": 0.5185548067092896,
1181
+ "eval_runtime": 3.2157,
1182
+ "eval_samples_per_second": 44.78,
1183
+ "eval_steps_per_second": 1.555,
1184
+ "step": 748
1185
+ },
1186
+ {
1187
+ "epoch": 73.17073170731707,
1188
+ "grad_norm": 40.11507034301758,
1189
+ "learning_rate": 2.0833333333333336e-05,
1190
+ "loss": 0.0858,
1191
+ "step": 750
1192
+ },
1193
+ {
1194
+ "epoch": 73.95121951219512,
1195
+ "eval_accuracy": 0.8263888888888888,
1196
+ "eval_loss": 0.5403424501419067,
1197
+ "eval_runtime": 3.1711,
1198
+ "eval_samples_per_second": 45.41,
1199
+ "eval_steps_per_second": 1.577,
1200
+ "step": 758
1201
+ },
1202
+ {
1203
+ "epoch": 74.14634146341463,
1204
+ "grad_norm": 21.469833374023438,
1205
+ "learning_rate": 2.037037037037037e-05,
1206
+ "loss": 0.0616,
1207
+ "step": 760
1208
+ },
1209
+ {
1210
+ "epoch": 74.92682926829268,
1211
+ "eval_accuracy": 0.8472222222222222,
1212
+ "eval_loss": 0.5104292631149292,
1213
+ "eval_runtime": 3.1108,
1214
+ "eval_samples_per_second": 46.29,
1215
+ "eval_steps_per_second": 1.607,
1216
+ "step": 768
1217
+ },
1218
+ {
1219
+ "epoch": 75.1219512195122,
1220
+ "grad_norm": 17.34604263305664,
1221
+ "learning_rate": 1.990740740740741e-05,
1222
+ "loss": 0.0777,
1223
+ "step": 770
1224
+ },
1225
+ {
1226
+ "epoch": 76.0,
1227
+ "eval_accuracy": 0.8402777777777778,
1228
+ "eval_loss": 0.5515955686569214,
1229
+ "eval_runtime": 3.0985,
1230
+ "eval_samples_per_second": 46.474,
1231
+ "eval_steps_per_second": 1.614,
1232
+ "step": 779
1233
+ },
1234
+ {
1235
+ "epoch": 76.09756097560975,
1236
+ "grad_norm": 25.844533920288086,
1237
+ "learning_rate": 1.9444444444444445e-05,
1238
+ "loss": 0.0668,
1239
+ "step": 780
1240
+ },
1241
+ {
1242
+ "epoch": 76.97560975609755,
1243
+ "eval_accuracy": 0.8611111111111112,
1244
+ "eval_loss": 0.49184906482696533,
1245
+ "eval_runtime": 3.1523,
1246
+ "eval_samples_per_second": 45.68,
1247
+ "eval_steps_per_second": 1.586,
1248
+ "step": 789
1249
+ },
1250
+ {
1251
+ "epoch": 77.07317073170732,
1252
+ "grad_norm": 3.816899299621582,
1253
+ "learning_rate": 1.8981481481481482e-05,
1254
+ "loss": 0.0585,
1255
+ "step": 790
1256
+ },
1257
+ {
1258
+ "epoch": 77.95121951219512,
1259
+ "eval_accuracy": 0.8402777777777778,
1260
+ "eval_loss": 0.5692147612571716,
1261
+ "eval_runtime": 3.1238,
1262
+ "eval_samples_per_second": 46.098,
1263
+ "eval_steps_per_second": 1.601,
1264
+ "step": 799
1265
+ },
1266
+ {
1267
+ "epoch": 78.04878048780488,
1268
+ "grad_norm": 1.5959941148757935,
1269
+ "learning_rate": 1.8518518518518518e-05,
1270
+ "loss": 0.0562,
1271
+ "step": 800
1272
+ },
1273
+ {
1274
+ "epoch": 78.92682926829268,
1275
+ "eval_accuracy": 0.8402777777777778,
1276
+ "eval_loss": 0.5733731389045715,
1277
+ "eval_runtime": 3.161,
1278
+ "eval_samples_per_second": 45.556,
1279
+ "eval_steps_per_second": 1.582,
1280
+ "step": 809
1281
+ },
1282
+ {
1283
+ "epoch": 79.02439024390245,
1284
+ "grad_norm": 8.219141960144043,
1285
+ "learning_rate": 1.8055555555555555e-05,
1286
+ "loss": 0.067,
1287
+ "step": 810
1288
+ },
1289
+ {
1290
+ "epoch": 80.0,
1291
+ "grad_norm": 15.953410148620605,
1292
+ "learning_rate": 1.7592592592592595e-05,
1293
+ "loss": 0.0653,
1294
+ "step": 820
1295
+ },
1296
+ {
1297
+ "epoch": 80.0,
1298
+ "eval_accuracy": 0.8263888888888888,
1299
+ "eval_loss": 0.5403192639350891,
1300
+ "eval_runtime": 3.2805,
1301
+ "eval_samples_per_second": 43.896,
1302
+ "eval_steps_per_second": 1.524,
1303
+ "step": 820
1304
+ },
1305
+ {
1306
+ "epoch": 80.97560975609755,
1307
+ "grad_norm": 6.4320783615112305,
1308
+ "learning_rate": 1.712962962962963e-05,
1309
+ "loss": 0.0434,
1310
+ "step": 830
1311
+ },
1312
+ {
1313
+ "epoch": 80.97560975609755,
1314
+ "eval_accuracy": 0.8333333333333334,
1315
+ "eval_loss": 0.5107588171958923,
1316
+ "eval_runtime": 3.1353,
1317
+ "eval_samples_per_second": 45.929,
1318
+ "eval_steps_per_second": 1.595,
1319
+ "step": 830
1320
+ },
1321
+ {
1322
+ "epoch": 81.95121951219512,
1323
+ "grad_norm": 4.175785064697266,
1324
+ "learning_rate": 1.6666666666666667e-05,
1325
+ "loss": 0.0483,
1326
+ "step": 840
1327
+ },
1328
+ {
1329
+ "epoch": 81.95121951219512,
1330
+ "eval_accuracy": 0.8125,
1331
+ "eval_loss": 0.5699278712272644,
1332
+ "eval_runtime": 3.084,
1333
+ "eval_samples_per_second": 46.693,
1334
+ "eval_steps_per_second": 1.621,
1335
+ "step": 840
1336
+ },
1337
+ {
1338
+ "epoch": 82.92682926829268,
1339
+ "grad_norm": 7.429763317108154,
1340
+ "learning_rate": 1.6203703703703704e-05,
1341
+ "loss": 0.0329,
1342
+ "step": 850
1343
+ },
1344
+ {
1345
+ "epoch": 82.92682926829268,
1346
+ "eval_accuracy": 0.8055555555555556,
1347
+ "eval_loss": 0.6027733087539673,
1348
+ "eval_runtime": 3.1234,
1349
+ "eval_samples_per_second": 46.104,
1350
+ "eval_steps_per_second": 1.601,
1351
+ "step": 850
1352
+ },
1353
+ {
1354
+ "epoch": 83.90243902439025,
1355
+ "grad_norm": 5.646729946136475,
1356
+ "learning_rate": 1.574074074074074e-05,
1357
+ "loss": 0.0431,
1358
+ "step": 860
1359
+ },
1360
+ {
1361
+ "epoch": 84.0,
1362
+ "eval_accuracy": 0.8333333333333334,
1363
+ "eval_loss": 0.5230019092559814,
1364
+ "eval_runtime": 3.1681,
1365
+ "eval_samples_per_second": 45.453,
1366
+ "eval_steps_per_second": 1.578,
1367
+ "step": 861
1368
+ },
1369
+ {
1370
+ "epoch": 84.8780487804878,
1371
+ "grad_norm": 4.747640132904053,
1372
+ "learning_rate": 1.527777777777778e-05,
1373
+ "loss": 0.042,
1374
+ "step": 870
1375
+ },
1376
+ {
1377
+ "epoch": 84.97560975609755,
1378
+ "eval_accuracy": 0.8194444444444444,
1379
+ "eval_loss": 0.5875388979911804,
1380
+ "eval_runtime": 3.1311,
1381
+ "eval_samples_per_second": 45.99,
1382
+ "eval_steps_per_second": 1.597,
1383
+ "step": 871
1384
+ },
1385
+ {
1386
+ "epoch": 85.85365853658537,
1387
+ "grad_norm": 7.094844341278076,
1388
+ "learning_rate": 1.4814814814814815e-05,
1389
+ "loss": 0.0449,
1390
+ "step": 880
1391
+ },
1392
+ {
1393
+ "epoch": 85.95121951219512,
1394
+ "eval_accuracy": 0.8611111111111112,
1395
+ "eval_loss": 0.5179998278617859,
1396
+ "eval_runtime": 3.1291,
1397
+ "eval_samples_per_second": 46.02,
1398
+ "eval_steps_per_second": 1.598,
1399
+ "step": 881
1400
+ },
1401
+ {
1402
+ "epoch": 86.82926829268293,
1403
+ "grad_norm": 9.338136672973633,
1404
+ "learning_rate": 1.4351851851851853e-05,
1405
+ "loss": 0.0512,
1406
+ "step": 890
1407
+ },
1408
+ {
1409
+ "epoch": 86.92682926829268,
1410
+ "eval_accuracy": 0.8194444444444444,
1411
+ "eval_loss": 0.5425156354904175,
1412
+ "eval_runtime": 3.1587,
1413
+ "eval_samples_per_second": 45.588,
1414
+ "eval_steps_per_second": 1.583,
1415
+ "step": 891
1416
+ },
1417
+ {
1418
+ "epoch": 87.8048780487805,
1419
+ "grad_norm": 20.005054473876953,
1420
+ "learning_rate": 1.388888888888889e-05,
1421
+ "loss": 0.0545,
1422
+ "step": 900
1423
+ },
1424
+ {
1425
+ "epoch": 88.0,
1426
+ "eval_accuracy": 0.8263888888888888,
1427
+ "eval_loss": 0.5689591765403748,
1428
+ "eval_runtime": 3.1184,
1429
+ "eval_samples_per_second": 46.177,
1430
+ "eval_steps_per_second": 1.603,
1431
+ "step": 902
1432
+ },
1433
+ {
1434
+ "epoch": 88.78048780487805,
1435
+ "grad_norm": 6.419548034667969,
1436
+ "learning_rate": 1.3425925925925928e-05,
1437
+ "loss": 0.0496,
1438
+ "step": 910
1439
+ },
1440
+ {
1441
+ "epoch": 88.97560975609755,
1442
+ "eval_accuracy": 0.8611111111111112,
1443
+ "eval_loss": 0.5619076490402222,
1444
+ "eval_runtime": 3.1196,
1445
+ "eval_samples_per_second": 46.16,
1446
+ "eval_steps_per_second": 1.603,
1447
+ "step": 912
1448
+ },
1449
+ {
1450
+ "epoch": 89.7560975609756,
1451
+ "grad_norm": 4.659784317016602,
1452
+ "learning_rate": 1.2962962962962962e-05,
1453
+ "loss": 0.0449,
1454
+ "step": 920
1455
+ },
1456
+ {
1457
+ "epoch": 89.95121951219512,
1458
+ "eval_accuracy": 0.8333333333333334,
1459
+ "eval_loss": 0.5625645518302917,
1460
+ "eval_runtime": 3.1196,
1461
+ "eval_samples_per_second": 46.16,
1462
+ "eval_steps_per_second": 1.603,
1463
+ "step": 922
1464
+ },
1465
+ {
1466
+ "epoch": 90.73170731707317,
1467
+ "grad_norm": 10.376348495483398,
1468
+ "learning_rate": 1.25e-05,
1469
+ "loss": 0.0405,
1470
+ "step": 930
1471
+ },
1472
+ {
1473
+ "epoch": 90.92682926829268,
1474
+ "eval_accuracy": 0.8402777777777778,
1475
+ "eval_loss": 0.526747465133667,
1476
+ "eval_runtime": 3.1368,
1477
+ "eval_samples_per_second": 45.906,
1478
+ "eval_steps_per_second": 1.594,
1479
+ "step": 932
1480
+ },
1481
+ {
1482
+ "epoch": 91.70731707317073,
1483
+ "grad_norm": 1.9461939334869385,
1484
+ "learning_rate": 1.2037037037037037e-05,
1485
+ "loss": 0.0344,
1486
+ "step": 940
1487
+ },
1488
+ {
1489
+ "epoch": 92.0,
1490
+ "eval_accuracy": 0.8402777777777778,
1491
+ "eval_loss": 0.5616637468338013,
1492
+ "eval_runtime": 3.1356,
1493
+ "eval_samples_per_second": 45.925,
1494
+ "eval_steps_per_second": 1.595,
1495
+ "step": 943
1496
+ },
1497
+ {
1498
+ "epoch": 92.6829268292683,
1499
+ "grad_norm": 6.943029880523682,
1500
+ "learning_rate": 1.1574074074074075e-05,
1501
+ "loss": 0.0421,
1502
+ "step": 950
1503
+ },
1504
+ {
1505
+ "epoch": 92.97560975609755,
1506
+ "eval_accuracy": 0.8611111111111112,
1507
+ "eval_loss": 0.5399531126022339,
1508
+ "eval_runtime": 3.1511,
1509
+ "eval_samples_per_second": 45.698,
1510
+ "eval_steps_per_second": 1.587,
1511
+ "step": 953
1512
+ },
1513
+ {
1514
+ "epoch": 93.65853658536585,
1515
+ "grad_norm": 3.614881753921509,
1516
+ "learning_rate": 1.1111111111111112e-05,
1517
+ "loss": 0.0341,
1518
+ "step": 960
1519
+ },
1520
+ {
1521
+ "epoch": 93.95121951219512,
1522
+ "eval_accuracy": 0.8333333333333334,
1523
+ "eval_loss": 0.5728729963302612,
1524
+ "eval_runtime": 3.1618,
1525
+ "eval_samples_per_second": 45.544,
1526
+ "eval_steps_per_second": 1.581,
1527
+ "step": 963
1528
+ },
1529
+ {
1530
+ "epoch": 94.63414634146342,
1531
+ "grad_norm": 24.69358253479004,
1532
+ "learning_rate": 1.0648148148148148e-05,
1533
+ "loss": 0.0492,
1534
+ "step": 970
1535
+ },
1536
+ {
1537
+ "epoch": 94.92682926829268,
1538
+ "eval_accuracy": 0.8055555555555556,
1539
+ "eval_loss": 0.5855351686477661,
1540
+ "eval_runtime": 3.1608,
1541
+ "eval_samples_per_second": 45.559,
1542
+ "eval_steps_per_second": 1.582,
1543
+ "step": 973
1544
+ },
1545
+ {
1546
+ "epoch": 95.60975609756098,
1547
+ "grad_norm": 7.931139945983887,
1548
+ "learning_rate": 1.0185185185185185e-05,
1549
+ "loss": 0.0374,
1550
+ "step": 980
1551
+ },
1552
+ {
1553
+ "epoch": 96.0,
1554
+ "eval_accuracy": 0.8125,
1555
+ "eval_loss": 0.6113177537918091,
1556
+ "eval_runtime": 2.99,
1557
+ "eval_samples_per_second": 48.161,
1558
+ "eval_steps_per_second": 1.672,
1559
+ "step": 984
1560
+ },
1561
+ {
1562
+ "epoch": 96.58536585365853,
1563
+ "grad_norm": 7.854911804199219,
1564
+ "learning_rate": 9.722222222222223e-06,
1565
+ "loss": 0.0375,
1566
+ "step": 990
1567
+ },
1568
+ {
1569
+ "epoch": 96.97560975609755,
1570
+ "eval_accuracy": 0.8402777777777778,
1571
+ "eval_loss": 0.5511393547058105,
1572
+ "eval_runtime": 3.0799,
1573
+ "eval_samples_per_second": 46.755,
1574
+ "eval_steps_per_second": 1.623,
1575
+ "step": 994
1576
+ },
1577
+ {
1578
+ "epoch": 97.5609756097561,
1579
+ "grad_norm": 19.6168270111084,
1580
+ "learning_rate": 9.259259259259259e-06,
1581
+ "loss": 0.0373,
1582
+ "step": 1000
1583
+ },
1584
+ {
1585
+ "epoch": 97.95121951219512,
1586
+ "eval_accuracy": 0.8541666666666666,
1587
+ "eval_loss": 0.49421417713165283,
1588
+ "eval_runtime": 3.1042,
1589
+ "eval_samples_per_second": 46.388,
1590
+ "eval_steps_per_second": 1.611,
1591
+ "step": 1004
1592
+ },
1593
+ {
1594
+ "epoch": 98.53658536585365,
1595
+ "grad_norm": 11.319071769714355,
1596
+ "learning_rate": 8.796296296296297e-06,
1597
+ "loss": 0.0447,
1598
+ "step": 1010
1599
+ },
1600
+ {
1601
+ "epoch": 98.92682926829268,
1602
+ "eval_accuracy": 0.8541666666666666,
1603
+ "eval_loss": 0.5030938982963562,
1604
+ "eval_runtime": 3.0891,
1605
+ "eval_samples_per_second": 46.615,
1606
+ "eval_steps_per_second": 1.619,
1607
+ "step": 1014
1608
+ },
1609
+ {
1610
+ "epoch": 99.51219512195122,
1611
+ "grad_norm": 12.67363452911377,
1612
+ "learning_rate": 8.333333333333334e-06,
1613
+ "loss": 0.0519,
1614
+ "step": 1020
1615
+ },
1616
+ {
1617
+ "epoch": 100.0,
1618
+ "eval_accuracy": 0.8541666666666666,
1619
+ "eval_loss": 0.5348986983299255,
1620
+ "eval_runtime": 3.0853,
1621
+ "eval_samples_per_second": 46.673,
1622
+ "eval_steps_per_second": 1.621,
1623
+ "step": 1025
1624
+ },
1625
+ {
1626
+ "epoch": 100.48780487804878,
1627
+ "grad_norm": 17.858867645263672,
1628
+ "learning_rate": 7.87037037037037e-06,
1629
+ "loss": 0.0387,
1630
+ "step": 1030
1631
+ },
1632
+ {
1633
+ "epoch": 100.97560975609755,
1634
+ "eval_accuracy": 0.8541666666666666,
1635
+ "eval_loss": 0.5510598421096802,
1636
+ "eval_runtime": 3.1136,
1637
+ "eval_samples_per_second": 46.249,
1638
+ "eval_steps_per_second": 1.606,
1639
+ "step": 1035
1640
+ },
1641
+ {
1642
+ "epoch": 101.46341463414635,
1643
+ "grad_norm": 2.6209030151367188,
1644
+ "learning_rate": 7.4074074074074075e-06,
1645
+ "loss": 0.0256,
1646
+ "step": 1040
1647
+ },
1648
+ {
1649
+ "epoch": 101.95121951219512,
1650
+ "eval_accuracy": 0.8402777777777778,
1651
+ "eval_loss": 0.5319210290908813,
1652
+ "eval_runtime": 3.0318,
1653
+ "eval_samples_per_second": 47.496,
1654
+ "eval_steps_per_second": 1.649,
1655
+ "step": 1045
1656
+ },
1657
+ {
1658
+ "epoch": 102.4390243902439,
1659
+ "grad_norm": 8.14228630065918,
1660
+ "learning_rate": 6.944444444444445e-06,
1661
+ "loss": 0.043,
1662
+ "step": 1050
1663
+ },
1664
+ {
1665
+ "epoch": 102.92682926829268,
1666
+ "eval_accuracy": 0.8263888888888888,
1667
+ "eval_loss": 0.5605261325836182,
1668
+ "eval_runtime": 3.0596,
1669
+ "eval_samples_per_second": 47.064,
1670
+ "eval_steps_per_second": 1.634,
1671
+ "step": 1055
1672
+ },
1673
+ {
1674
+ "epoch": 103.41463414634147,
1675
+ "grad_norm": 8.247823715209961,
1676
+ "learning_rate": 6.481481481481481e-06,
1677
+ "loss": 0.029,
1678
+ "step": 1060
1679
+ },
1680
+ {
1681
+ "epoch": 104.0,
1682
+ "eval_accuracy": 0.8402777777777778,
1683
+ "eval_loss": 0.5775593519210815,
1684
+ "eval_runtime": 3.3388,
1685
+ "eval_samples_per_second": 43.13,
1686
+ "eval_steps_per_second": 1.498,
1687
+ "step": 1066
1688
+ },
1689
+ {
1690
+ "epoch": 104.39024390243902,
1691
+ "grad_norm": 7.826836585998535,
1692
+ "learning_rate": 6.0185185185185185e-06,
1693
+ "loss": 0.0379,
1694
+ "step": 1070
1695
+ },
1696
+ {
1697
+ "epoch": 104.97560975609755,
1698
+ "eval_accuracy": 0.8472222222222222,
1699
+ "eval_loss": 0.5697184801101685,
1700
+ "eval_runtime": 3.1172,
1701
+ "eval_samples_per_second": 46.196,
1702
+ "eval_steps_per_second": 1.604,
1703
+ "step": 1076
1704
+ },
1705
+ {
1706
+ "epoch": 105.36585365853658,
1707
+ "grad_norm": 7.729676246643066,
1708
+ "learning_rate": 5.555555555555556e-06,
1709
+ "loss": 0.0445,
1710
+ "step": 1080
1711
+ },
1712
+ {
1713
+ "epoch": 105.95121951219512,
1714
+ "eval_accuracy": 0.8680555555555556,
1715
+ "eval_loss": 0.5132907629013062,
1716
+ "eval_runtime": 3.1003,
1717
+ "eval_samples_per_second": 46.447,
1718
+ "eval_steps_per_second": 1.613,
1719
+ "step": 1086
1720
+ },
1721
+ {
1722
+ "epoch": 106.34146341463415,
1723
+ "grad_norm": 18.125850677490234,
1724
+ "learning_rate": 5.092592592592592e-06,
1725
+ "loss": 0.0267,
1726
+ "step": 1090
1727
+ },
1728
+ {
1729
+ "epoch": 106.92682926829268,
1730
+ "eval_accuracy": 0.8680555555555556,
1731
+ "eval_loss": 0.5075670480728149,
1732
+ "eval_runtime": 2.9664,
1733
+ "eval_samples_per_second": 48.543,
1734
+ "eval_steps_per_second": 1.686,
1735
+ "step": 1096
1736
+ },
1737
+ {
1738
+ "epoch": 107.3170731707317,
1739
+ "grad_norm": 11.351465225219727,
1740
+ "learning_rate": 4.6296296296296296e-06,
1741
+ "loss": 0.044,
1742
+ "step": 1100
1743
+ },
1744
+ {
1745
+ "epoch": 108.0,
1746
+ "eval_accuracy": 0.8402777777777778,
1747
+ "eval_loss": 0.5260215401649475,
1748
+ "eval_runtime": 3.1083,
1749
+ "eval_samples_per_second": 46.328,
1750
+ "eval_steps_per_second": 1.609,
1751
+ "step": 1107
1752
+ },
1753
+ {
1754
+ "epoch": 108.29268292682927,
1755
+ "grad_norm": 2.0481507778167725,
1756
+ "learning_rate": 4.166666666666667e-06,
1757
+ "loss": 0.0263,
1758
+ "step": 1110
1759
+ },
1760
+ {
1761
+ "epoch": 108.97560975609755,
1762
+ "eval_accuracy": 0.8541666666666666,
1763
+ "eval_loss": 0.5101317167282104,
1764
+ "eval_runtime": 3.0889,
1765
+ "eval_samples_per_second": 46.619,
1766
+ "eval_steps_per_second": 1.619,
1767
+ "step": 1117
1768
+ },
1769
+ {
1770
+ "epoch": 109.26829268292683,
1771
+ "grad_norm": 4.980681419372559,
1772
+ "learning_rate": 3.7037037037037037e-06,
1773
+ "loss": 0.0247,
1774
+ "step": 1120
1775
+ },
1776
+ {
1777
+ "epoch": 109.95121951219512,
1778
+ "eval_accuracy": 0.8541666666666666,
1779
+ "eval_loss": 0.49724239110946655,
1780
+ "eval_runtime": 3.1321,
1781
+ "eval_samples_per_second": 45.975,
1782
+ "eval_steps_per_second": 1.596,
1783
+ "step": 1127
1784
+ },
1785
+ {
1786
+ "epoch": 110.2439024390244,
1787
+ "grad_norm": 11.607294082641602,
1788
+ "learning_rate": 3.2407407407407406e-06,
1789
+ "loss": 0.0441,
1790
+ "step": 1130
1791
+ },
1792
+ {
1793
+ "epoch": 110.92682926829268,
1794
+ "eval_accuracy": 0.8472222222222222,
1795
+ "eval_loss": 0.5093557834625244,
1796
+ "eval_runtime": 3.2177,
1797
+ "eval_samples_per_second": 44.753,
1798
+ "eval_steps_per_second": 1.554,
1799
+ "step": 1137
1800
+ },
1801
+ {
1802
+ "epoch": 111.21951219512195,
1803
+ "grad_norm": 0.6021797060966492,
1804
+ "learning_rate": 2.777777777777778e-06,
1805
+ "loss": 0.0263,
1806
+ "step": 1140
1807
+ },
1808
+ {
1809
+ "epoch": 112.0,
1810
+ "eval_accuracy": 0.8333333333333334,
1811
+ "eval_loss": 0.525884747505188,
1812
+ "eval_runtime": 3.1645,
1813
+ "eval_samples_per_second": 45.505,
1814
+ "eval_steps_per_second": 1.58,
1815
+ "step": 1148
1816
+ },
1817
+ {
1818
+ "epoch": 112.1951219512195,
1819
+ "grad_norm": 8.183242797851562,
1820
+ "learning_rate": 2.3148148148148148e-06,
1821
+ "loss": 0.0247,
1822
+ "step": 1150
1823
+ },
1824
+ {
1825
+ "epoch": 112.97560975609755,
1826
+ "eval_accuracy": 0.8402777777777778,
1827
+ "eval_loss": 0.5323313474655151,
1828
+ "eval_runtime": 3.0594,
1829
+ "eval_samples_per_second": 47.067,
1830
+ "eval_steps_per_second": 1.634,
1831
+ "step": 1158
1832
+ },
1833
+ {
1834
+ "epoch": 113.17073170731707,
1835
+ "grad_norm": 18.887975692749023,
1836
+ "learning_rate": 1.8518518518518519e-06,
1837
+ "loss": 0.0356,
1838
+ "step": 1160
1839
+ },
1840
+ {
1841
+ "epoch": 113.95121951219512,
1842
+ "eval_accuracy": 0.8402777777777778,
1843
+ "eval_loss": 0.5275124907493591,
1844
+ "eval_runtime": 3.1264,
1845
+ "eval_samples_per_second": 46.059,
1846
+ "eval_steps_per_second": 1.599,
1847
+ "step": 1168
1848
+ },
1849
+ {
1850
+ "epoch": 114.14634146341463,
1851
+ "grad_norm": 9.692997932434082,
1852
+ "learning_rate": 1.388888888888889e-06,
1853
+ "loss": 0.0297,
1854
+ "step": 1170
1855
+ },
1856
+ {
1857
+ "epoch": 114.92682926829268,
1858
+ "eval_accuracy": 0.8333333333333334,
1859
+ "eval_loss": 0.5239912867546082,
1860
+ "eval_runtime": 3.0878,
1861
+ "eval_samples_per_second": 46.636,
1862
+ "eval_steps_per_second": 1.619,
1863
+ "step": 1178
1864
+ },
1865
+ {
1866
+ "epoch": 115.1219512195122,
1867
+ "grad_norm": 7.508498191833496,
1868
+ "learning_rate": 9.259259259259259e-07,
1869
+ "loss": 0.044,
1870
+ "step": 1180
1871
+ },
1872
+ {
1873
+ "epoch": 116.0,
1874
+ "eval_accuracy": 0.8472222222222222,
1875
+ "eval_loss": 0.520145833492279,
1876
+ "eval_runtime": 3.0984,
1877
+ "eval_samples_per_second": 46.475,
1878
+ "eval_steps_per_second": 1.614,
1879
+ "step": 1189
1880
+ },
1881
+ {
1882
+ "epoch": 116.09756097560975,
1883
+ "grad_norm": 2.9772377014160156,
1884
+ "learning_rate": 4.6296296296296297e-07,
1885
+ "loss": 0.031,
1886
+ "step": 1190
1887
+ },
1888
+ {
1889
+ "epoch": 116.97560975609755,
1890
+ "eval_accuracy": 0.8402777777777778,
1891
+ "eval_loss": 0.5203036069869995,
1892
+ "eval_runtime": 3.066,
1893
+ "eval_samples_per_second": 46.966,
1894
+ "eval_steps_per_second": 1.631,
1895
+ "step": 1199
1896
+ },
1897
+ {
1898
+ "epoch": 117.07317073170732,
1899
+ "grad_norm": 3.435035467147827,
1900
+ "learning_rate": 0.0,
1901
+ "loss": 0.0369,
1902
+ "step": 1200
1903
+ },
1904
+ {
1905
+ "epoch": 117.07317073170732,
1906
+ "eval_accuracy": 0.8402777777777778,
1907
+ "eval_loss": 0.5202847123146057,
1908
+ "eval_runtime": 3.0873,
1909
+ "eval_samples_per_second": 46.642,
1910
+ "eval_steps_per_second": 1.62,
1911
+ "step": 1200
1912
+ },
1913
+ {
1914
+ "epoch": 117.07317073170732,
1915
+ "step": 1200,
1916
+ "total_flos": 3.819974210196996e+18,
1917
+ "train_loss": 0.3624983422954877,
1918
+ "train_runtime": 4158.3882,
1919
+ "train_samples_per_second": 37.399,
1920
+ "train_steps_per_second": 0.289
1921
+ }
1922
+ ],
1923
+ "logging_steps": 10,
1924
+ "max_steps": 1200,
1925
+ "num_input_tokens_seen": 0,
1926
+ "num_train_epochs": 120,
1927
+ "save_steps": 500,
1928
+ "stateful_callbacks": {
1929
+ "TrainerControl": {
1930
+ "args": {
1931
+ "should_epoch_stop": false,
1932
+ "should_evaluate": false,
1933
+ "should_log": false,
1934
+ "should_save": false,
1935
+ "should_training_stop": false
1936
+ },
1937
+ "attributes": {}
1938
+ }
1939
+ },
1940
+ "total_flos": 3.819974210196996e+18,
1941
+ "train_batch_size": 32,
1942
+ "trial_name": null,
1943
+ "trial_params": null
1944
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2df9544a19a4652671b3b26f754625fdaf4703202fcdfb2d53d46014e546b32e
3
+ size 5240