Elron commited on
Commit
e80ab19
1 Parent(s): 49cafa5

Pushing deberta-v3-large-sentiment to hub

Browse files
README.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - generated_from_trainer
5
+ metrics:
6
+ - accuracy
7
+ model-index:
8
+ - name: deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0
16
+
17
+ This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on an unknown dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 1.3253
20
+ - Accuracy: 0.7365
21
+
22
+ ## Model description
23
+
24
+ More information needed
25
+
26
+ ## Intended uses & limitations
27
+
28
+ More information needed
29
+
30
+ ## Training and evaluation data
31
+
32
+ More information needed
33
+
34
+ ## Training procedure
35
+
36
+ ### Training hyperparameters
37
+
38
+ The following hyperparameters were used during training:
39
+ - learning_rate: 5e-06
40
+ - train_batch_size: 16
41
+ - eval_batch_size: 16
42
+ - seed: 42
43
+ - gradient_accumulation_steps: 2
44
+ - total_train_batch_size: 32
45
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
+ - lr_scheduler_type: linear
47
+ - lr_scheduler_warmup_steps: 50
48
+ - num_epochs: 10.0
49
+
50
+ ### Training results
51
+
52
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
53
+ |:-------------:|:-----:|:-----:|:---------------:|:--------:|
54
+ | 1.0614 | 0.07 | 100 | 1.0196 | 0.4345 |
55
+ | 0.8601 | 0.14 | 200 | 0.7561 | 0.6460 |
56
+ | 0.734 | 0.21 | 300 | 0.6796 | 0.6955 |
57
+ | 0.6753 | 0.28 | 400 | 0.6521 | 0.7000 |
58
+ | 0.6408 | 0.35 | 500 | 0.6119 | 0.7440 |
59
+ | 0.5991 | 0.42 | 600 | 0.6034 | 0.7370 |
60
+ | 0.6069 | 0.49 | 700 | 0.5976 | 0.7375 |
61
+ | 0.6122 | 0.56 | 800 | 0.5871 | 0.7425 |
62
+ | 0.5908 | 0.63 | 900 | 0.5935 | 0.7445 |
63
+ | 0.5884 | 0.7 | 1000 | 0.5792 | 0.7520 |
64
+ | 0.5839 | 0.77 | 1100 | 0.5780 | 0.7555 |
65
+ | 0.5772 | 0.84 | 1200 | 0.5727 | 0.7570 |
66
+ | 0.5895 | 0.91 | 1300 | 0.5601 | 0.7550 |
67
+ | 0.5757 | 0.98 | 1400 | 0.5613 | 0.7525 |
68
+ | 0.5121 | 1.05 | 1500 | 0.5867 | 0.7600 |
69
+ | 0.5254 | 1.12 | 1600 | 0.5595 | 0.7630 |
70
+ | 0.5074 | 1.19 | 1700 | 0.5594 | 0.7585 |
71
+ | 0.4947 | 1.26 | 1800 | 0.5697 | 0.7575 |
72
+ | 0.5019 | 1.33 | 1900 | 0.5665 | 0.7580 |
73
+ | 0.5005 | 1.4 | 2000 | 0.5484 | 0.7655 |
74
+ | 0.5125 | 1.47 | 2100 | 0.5626 | 0.7605 |
75
+ | 0.5241 | 1.54 | 2200 | 0.5561 | 0.7560 |
76
+ | 0.5198 | 1.61 | 2300 | 0.5602 | 0.7600 |
77
+ | 0.5124 | 1.68 | 2400 | 0.5654 | 0.7490 |
78
+ | 0.5096 | 1.75 | 2500 | 0.5803 | 0.7515 |
79
+ | 0.4885 | 1.82 | 2600 | 0.5889 | 0.75 |
80
+ | 0.5111 | 1.89 | 2700 | 0.5508 | 0.7665 |
81
+ | 0.4868 | 1.96 | 2800 | 0.5621 | 0.7635 |
82
+ | 0.4599 | 2.04 | 2900 | 0.5995 | 0.7615 |
83
+ | 0.4147 | 2.11 | 3000 | 0.6202 | 0.7530 |
84
+ | 0.4233 | 2.18 | 3100 | 0.5875 | 0.7625 |
85
+ | 0.4324 | 2.25 | 3200 | 0.5794 | 0.7610 |
86
+ | 0.4141 | 2.32 | 3300 | 0.5902 | 0.7460 |
87
+ | 0.4306 | 2.39 | 3400 | 0.6053 | 0.7545 |
88
+ | 0.4266 | 2.46 | 3500 | 0.5979 | 0.7570 |
89
+ | 0.4227 | 2.53 | 3600 | 0.5920 | 0.7650 |
90
+ | 0.4226 | 2.6 | 3700 | 0.6166 | 0.7455 |
91
+ | 0.3978 | 2.67 | 3800 | 0.6126 | 0.7560 |
92
+ | 0.3954 | 2.74 | 3900 | 0.6152 | 0.7550 |
93
+ | 0.4209 | 2.81 | 4000 | 0.5980 | 0.75 |
94
+ | 0.3982 | 2.88 | 4100 | 0.6096 | 0.7490 |
95
+ | 0.4016 | 2.95 | 4200 | 0.6541 | 0.7425 |
96
+ | 0.3966 | 3.02 | 4300 | 0.6377 | 0.7545 |
97
+ | 0.3074 | 3.09 | 4400 | 0.6860 | 0.75 |
98
+ | 0.3551 | 3.16 | 4500 | 0.6160 | 0.7550 |
99
+ | 0.3323 | 3.23 | 4600 | 0.6714 | 0.7520 |
100
+ | 0.3171 | 3.3 | 4700 | 0.6538 | 0.7535 |
101
+ | 0.3403 | 3.37 | 4800 | 0.6774 | 0.7465 |
102
+ | 0.3396 | 3.44 | 4900 | 0.6726 | 0.7465 |
103
+ | 0.3259 | 3.51 | 5000 | 0.6465 | 0.7480 |
104
+ | 0.3392 | 3.58 | 5100 | 0.6860 | 0.7460 |
105
+ | 0.3251 | 3.65 | 5200 | 0.6697 | 0.7495 |
106
+ | 0.3253 | 3.72 | 5300 | 0.6770 | 0.7430 |
107
+ | 0.3455 | 3.79 | 5400 | 0.7177 | 0.7360 |
108
+ | 0.3323 | 3.86 | 5500 | 0.6943 | 0.7400 |
109
+ | 0.3335 | 3.93 | 5600 | 0.6507 | 0.7555 |
110
+ | 0.3368 | 4.0 | 5700 | 0.6580 | 0.7485 |
111
+ | 0.2479 | 4.07 | 5800 | 0.7667 | 0.7430 |
112
+ | 0.2613 | 4.14 | 5900 | 0.7513 | 0.7505 |
113
+ | 0.2557 | 4.21 | 6000 | 0.7927 | 0.7485 |
114
+ | 0.243 | 4.28 | 6100 | 0.7792 | 0.7450 |
115
+ | 0.2473 | 4.35 | 6200 | 0.8107 | 0.7355 |
116
+ | 0.2447 | 4.42 | 6300 | 0.7851 | 0.7370 |
117
+ | 0.2515 | 4.49 | 6400 | 0.7529 | 0.7465 |
118
+ | 0.274 | 4.56 | 6500 | 0.7390 | 0.7465 |
119
+ | 0.2674 | 4.63 | 6600 | 0.7658 | 0.7460 |
120
+ | 0.2416 | 4.7 | 6700 | 0.7915 | 0.7485 |
121
+ | 0.2432 | 4.77 | 6800 | 0.7989 | 0.7435 |
122
+ | 0.2595 | 4.84 | 6900 | 0.7850 | 0.7380 |
123
+ | 0.2736 | 4.91 | 7000 | 0.7577 | 0.7395 |
124
+ | 0.2783 | 4.98 | 7100 | 0.7650 | 0.7405 |
125
+ | 0.2304 | 5.05 | 7200 | 0.8542 | 0.7385 |
126
+ | 0.1937 | 5.12 | 7300 | 0.8390 | 0.7345 |
127
+ | 0.1878 | 5.19 | 7400 | 0.9150 | 0.7330 |
128
+ | 0.1921 | 5.26 | 7500 | 0.8792 | 0.7405 |
129
+ | 0.1916 | 5.33 | 7600 | 0.8892 | 0.7410 |
130
+ | 0.2011 | 5.4 | 7700 | 0.9012 | 0.7325 |
131
+ | 0.211 | 5.47 | 7800 | 0.8608 | 0.7420 |
132
+ | 0.2194 | 5.54 | 7900 | 0.8852 | 0.7320 |
133
+ | 0.205 | 5.61 | 8000 | 0.8803 | 0.7385 |
134
+ | 0.1981 | 5.68 | 8100 | 0.8681 | 0.7330 |
135
+ | 0.1908 | 5.75 | 8200 | 0.9020 | 0.7435 |
136
+ | 0.1942 | 5.82 | 8300 | 0.8780 | 0.7410 |
137
+ | 0.1958 | 5.89 | 8400 | 0.8937 | 0.7345 |
138
+ | 0.1883 | 5.96 | 8500 | 0.9121 | 0.7360 |
139
+ | 0.1819 | 6.04 | 8600 | 0.9409 | 0.7430 |
140
+ | 0.145 | 6.11 | 8700 | 1.1390 | 0.7265 |
141
+ | 0.1696 | 6.18 | 8800 | 0.9189 | 0.7430 |
142
+ | 0.1488 | 6.25 | 8900 | 0.9718 | 0.7400 |
143
+ | 0.1637 | 6.32 | 9000 | 0.9702 | 0.7450 |
144
+ | 0.1547 | 6.39 | 9100 | 1.0033 | 0.7410 |
145
+ | 0.1605 | 6.46 | 9200 | 0.9973 | 0.7355 |
146
+ | 0.1552 | 6.53 | 9300 | 1.0491 | 0.7290 |
147
+ | 0.1731 | 6.6 | 9400 | 1.0271 | 0.7335 |
148
+ | 0.1738 | 6.67 | 9500 | 0.9575 | 0.7430 |
149
+ | 0.1669 | 6.74 | 9600 | 0.9614 | 0.7350 |
150
+ | 0.1347 | 6.81 | 9700 | 1.0263 | 0.7365 |
151
+ | 0.1593 | 6.88 | 9800 | 1.0173 | 0.7360 |
152
+ | 0.1549 | 6.95 | 9900 | 1.0398 | 0.7350 |
153
+ | 0.1675 | 7.02 | 10000 | 0.9975 | 0.7380 |
154
+ | 0.1182 | 7.09 | 10100 | 1.1059 | 0.7350 |
155
+ | 0.1351 | 7.16 | 10200 | 1.0933 | 0.7400 |
156
+ | 0.1496 | 7.23 | 10300 | 1.0731 | 0.7355 |
157
+ | 0.1197 | 7.3 | 10400 | 1.1089 | 0.7360 |
158
+ | 0.1111 | 7.37 | 10500 | 1.1381 | 0.7405 |
159
+ | 0.1494 | 7.44 | 10600 | 1.0252 | 0.7425 |
160
+ | 0.1235 | 7.51 | 10700 | 1.0906 | 0.7360 |
161
+ | 0.133 | 7.58 | 10800 | 1.1796 | 0.7375 |
162
+ | 0.1248 | 7.65 | 10900 | 1.1332 | 0.7420 |
163
+ | 0.1268 | 7.72 | 11000 | 1.1304 | 0.7415 |
164
+ | 0.1368 | 7.79 | 11100 | 1.1345 | 0.7380 |
165
+ | 0.1228 | 7.86 | 11200 | 1.2018 | 0.7320 |
166
+ | 0.1281 | 7.93 | 11300 | 1.1884 | 0.7350 |
167
+ | 0.1449 | 8.0 | 11400 | 1.1571 | 0.7345 |
168
+ | 0.1025 | 8.07 | 11500 | 1.1538 | 0.7345 |
169
+ | 0.1199 | 8.14 | 11600 | 1.2113 | 0.7390 |
170
+ | 0.1016 | 8.21 | 11700 | 1.2882 | 0.7370 |
171
+ | 0.114 | 8.28 | 11800 | 1.2872 | 0.7390 |
172
+ | 0.1019 | 8.35 | 11900 | 1.2876 | 0.7380 |
173
+ | 0.1142 | 8.42 | 12000 | 1.2791 | 0.7385 |
174
+ | 0.1135 | 8.49 | 12100 | 1.2883 | 0.7380 |
175
+ | 0.1139 | 8.56 | 12200 | 1.2829 | 0.7360 |
176
+ | 0.1107 | 8.63 | 12300 | 1.2698 | 0.7365 |
177
+ | 0.1183 | 8.7 | 12400 | 1.2660 | 0.7345 |
178
+ | 0.1064 | 8.77 | 12500 | 1.2889 | 0.7365 |
179
+ | 0.0895 | 8.84 | 12600 | 1.3480 | 0.7330 |
180
+ | 0.1244 | 8.91 | 12700 | 1.2872 | 0.7325 |
181
+ | 0.1209 | 8.98 | 12800 | 1.2681 | 0.7375 |
182
+ | 0.1144 | 9.05 | 12900 | 1.2711 | 0.7370 |
183
+ | 0.1034 | 9.12 | 13000 | 1.2801 | 0.7360 |
184
+ | 0.113 | 9.19 | 13100 | 1.2801 | 0.7350 |
185
+ | 0.0994 | 9.26 | 13200 | 1.2920 | 0.7360 |
186
+ | 0.0966 | 9.33 | 13300 | 1.2761 | 0.7335 |
187
+ | 0.0939 | 9.4 | 13400 | 1.2909 | 0.7365 |
188
+ | 0.0975 | 9.47 | 13500 | 1.2953 | 0.7360 |
189
+ | 0.0842 | 9.54 | 13600 | 1.3179 | 0.7335 |
190
+ | 0.0871 | 9.61 | 13700 | 1.3149 | 0.7385 |
191
+ | 0.1162 | 9.68 | 13800 | 1.3124 | 0.7350 |
192
+ | 0.085 | 9.75 | 13900 | 1.3207 | 0.7355 |
193
+ | 0.0966 | 9.82 | 14000 | 1.3248 | 0.7335 |
194
+ | 0.1064 | 9.89 | 14100 | 1.3261 | 0.7335 |
195
+ | 0.1046 | 9.96 | 14200 | 1.3255 | 0.7360 |
196
+
197
+
198
+ ### Framework versions
199
+
200
+ - Transformers 4.20.0.dev0
201
+ - Pytorch 1.9.0
202
+ - Datasets 2.2.2
203
+ - Tokenizers 0.11.6
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
all_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.7365000247955322,
4
+ "eval_loss": 1.3253138065338135,
5
+ "eval_runtime": 22.8646,
6
+ "eval_samples": 2000,
7
+ "eval_samples_per_second": 87.472,
8
+ "eval_steps_per_second": 5.467,
9
+ "train_loss": 0.2872312853629129,
10
+ "train_runtime": 13159.309,
11
+ "train_samples": 45615,
12
+ "train_samples_per_second": 34.664,
13
+ "train_steps_per_second": 1.083
14
+ }
config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/deberta-v3-large",
3
+ "architectures": [
4
+ "DebertaV2ForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 1024,
10
+ "id2label": {
11
+ "0": 0,
12
+ "1": 1,
13
+ "2": 2
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 4096,
17
+ "label2id": {
18
+ "0": 0,
19
+ "1": 1,
20
+ "2": 2
21
+ },
22
+ "layer_norm_eps": 1e-07,
23
+ "max_position_embeddings": 512,
24
+ "max_relative_positions": -1,
25
+ "model_type": "deberta-v2",
26
+ "norm_rel_ebd": "layer_norm",
27
+ "num_attention_heads": 16,
28
+ "num_hidden_layers": 24,
29
+ "pad_token_id": 0,
30
+ "pooler_dropout": 0,
31
+ "pooler_hidden_act": "gelu",
32
+ "pooler_hidden_size": 1024,
33
+ "pos_att_type": [
34
+ "p2c",
35
+ "c2p"
36
+ ],
37
+ "position_biased_input": false,
38
+ "position_buckets": 256,
39
+ "relative_attention": true,
40
+ "share_att_key": true,
41
+ "torch_dtype": "float32",
42
+ "transformers_version": "4.20.0.dev0",
43
+ "type_vocab_size": 0,
44
+ "vocab_size": 128100
45
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eval_accuracy": 0.7664999961853027,
3
+ "eval_loss": 0.5507832169532776,
4
+ "eval_runtime": 19.7263,
5
+ "eval_samples": 2000,
6
+ "eval_samples_per_second": 101.388,
7
+ "eval_steps_per_second": 6.337
8
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f18728f5092fa44baf80f9c8962a3fccc3cf3a7cbf397fc4ca6de735d50a7f73
3
+ size 1740397483
run_test.sh ADDED
@@ -0,0 +1 @@
 
 
1
+ jbsub -queue x86_1h -cores 4+1 -mem 30g -require a100 -o outputs/train/tweet_eval2/sentiment/deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0/test.log /dccstor/tslm/envs/anaconda3/envs/tslm-gen/bin/python train_clf.py --model_name_or_path outputs/train/tweet_eval2/sentiment/deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0/best_checkpoint --train_file data/tweet_eval/sentiment/train.csv --validation_file data/tweet_eval/sentiment/validation.csv --test_file data/tweet_eval/sentiment/test.csv --do_eval --do_predict --report_to none --per_device_eval_batch_size 16 --max_seq_length 256 --output_dir outputs/train/tweet_eval2/sentiment/deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0/best_checkpoint
run_train.sh ADDED
@@ -0,0 +1 @@
 
 
1
+ jbsub -queue x86_6h -cores 4+1 -mem 30g -require a100 -o outputs/train/tweet_eval2/sentiment/deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0/train.log /dccstor/tslm/envs/anaconda3/envs/tslm-gen/bin/python train_clf.py --model_name_or_path microsoft/deberta-v3-large --train_file data/tweet_eval/sentiment/train.csv --validation_file data/tweet_eval/sentiment/validation.csv --do_train --do_eval --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --max_seq_length 256 --learning_rate 5e-6 --output_dir outputs/train/tweet_eval2/sentiment/deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0 --evaluation_strategy steps --save_strategy no --warmup_steps 50 --num_train_epochs 10 --overwrite_output_dir --logging_steps 100 --gradient_accumulation_steps 2 --label_smoothing_factor 0.0 --report_to clearml --metric_for_best_model accuracy --logging_dir outputs/train/tweet_eval2/sentiment/deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0/tb \; rm -rf outputs/train/tweet_eval2/sentiment/deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0/tb \; rm -rf outputs/train/tweet_eval2/sentiment/deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0/checkpoint-* \; . outputs/train/tweet_eval2/sentiment/deberta-v3-large-sentiment-lr5e-6-gas2-ls0.0/run_test.sh
special_tokens_map.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": "[UNK]"
9
+ }
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
test_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eval_accuracy": 0.7393357157707214,
3
+ "eval_loss": 0.5786939859390259,
4
+ "eval_runtime": 134.3696,
5
+ "eval_samples_per_second": 91.419,
6
+ "eval_steps_per_second": 5.716,
7
+ "test_samples": 12284
8
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "do_lower_case": false,
5
+ "eos_token": "[SEP]",
6
+ "mask_token": "[MASK]",
7
+ "name_or_path": "microsoft/deberta-v3-large",
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "sp_model_kwargs": {},
11
+ "special_tokens_map_file": null,
12
+ "split_by_punct": false,
13
+ "tokenizer_class": "DebertaV2Tokenizer",
14
+ "unk_token": "[UNK]",
15
+ "vocab_type": "spm"
16
+ }
trainer_state.json ADDED
@@ -0,0 +1,2155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 9.99964924587864,
5
+ "global_step": 14250,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.07,
12
+ "learning_rate": 4.982394366197183e-06,
13
+ "loss": 1.0614,
14
+ "step": 100
15
+ },
16
+ {
17
+ "epoch": 0.07,
18
+ "eval_accuracy": 0.4345000088214874,
19
+ "eval_loss": 1.019625186920166,
20
+ "eval_runtime": 18.8668,
21
+ "eval_samples_per_second": 106.006,
22
+ "eval_steps_per_second": 6.625,
23
+ "step": 100
24
+ },
25
+ {
26
+ "epoch": 0.14,
27
+ "learning_rate": 4.94718309859155e-06,
28
+ "loss": 0.8601,
29
+ "step": 200
30
+ },
31
+ {
32
+ "epoch": 0.14,
33
+ "eval_accuracy": 0.6460000276565552,
34
+ "eval_loss": 0.756058931350708,
35
+ "eval_runtime": 19.8113,
36
+ "eval_samples_per_second": 100.953,
37
+ "eval_steps_per_second": 6.31,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.21,
42
+ "learning_rate": 4.911971830985916e-06,
43
+ "loss": 0.734,
44
+ "step": 300
45
+ },
46
+ {
47
+ "epoch": 0.21,
48
+ "eval_accuracy": 0.6955000162124634,
49
+ "eval_loss": 0.6796478629112244,
50
+ "eval_runtime": 19.6375,
51
+ "eval_samples_per_second": 101.846,
52
+ "eval_steps_per_second": 6.365,
53
+ "step": 300
54
+ },
55
+ {
56
+ "epoch": 0.28,
57
+ "learning_rate": 4.876760563380282e-06,
58
+ "loss": 0.6753,
59
+ "step": 400
60
+ },
61
+ {
62
+ "epoch": 0.28,
63
+ "eval_accuracy": 0.699999988079071,
64
+ "eval_loss": 0.6520820260047913,
65
+ "eval_runtime": 18.2758,
66
+ "eval_samples_per_second": 109.434,
67
+ "eval_steps_per_second": 6.84,
68
+ "step": 400
69
+ },
70
+ {
71
+ "epoch": 0.35,
72
+ "learning_rate": 4.841549295774649e-06,
73
+ "loss": 0.6408,
74
+ "step": 500
75
+ },
76
+ {
77
+ "epoch": 0.35,
78
+ "eval_accuracy": 0.7440000176429749,
79
+ "eval_loss": 0.6119081974029541,
80
+ "eval_runtime": 17.8217,
81
+ "eval_samples_per_second": 112.223,
82
+ "eval_steps_per_second": 7.014,
83
+ "step": 500
84
+ },
85
+ {
86
+ "epoch": 0.42,
87
+ "learning_rate": 4.806338028169015e-06,
88
+ "loss": 0.5991,
89
+ "step": 600
90
+ },
91
+ {
92
+ "epoch": 0.42,
93
+ "eval_accuracy": 0.7369999885559082,
94
+ "eval_loss": 0.6033942699432373,
95
+ "eval_runtime": 17.3933,
96
+ "eval_samples_per_second": 114.986,
97
+ "eval_steps_per_second": 7.187,
98
+ "step": 600
99
+ },
100
+ {
101
+ "epoch": 0.49,
102
+ "learning_rate": 4.771126760563381e-06,
103
+ "loss": 0.6069,
104
+ "step": 700
105
+ },
106
+ {
107
+ "epoch": 0.49,
108
+ "eval_accuracy": 0.737500011920929,
109
+ "eval_loss": 0.597550630569458,
110
+ "eval_runtime": 20.251,
111
+ "eval_samples_per_second": 98.76,
112
+ "eval_steps_per_second": 6.173,
113
+ "step": 700
114
+ },
115
+ {
116
+ "epoch": 0.56,
117
+ "learning_rate": 4.735915492957747e-06,
118
+ "loss": 0.6122,
119
+ "step": 800
120
+ },
121
+ {
122
+ "epoch": 0.56,
123
+ "eval_accuracy": 0.7425000071525574,
124
+ "eval_loss": 0.5870603322982788,
125
+ "eval_runtime": 21.2152,
126
+ "eval_samples_per_second": 94.272,
127
+ "eval_steps_per_second": 5.892,
128
+ "step": 800
129
+ },
130
+ {
131
+ "epoch": 0.63,
132
+ "learning_rate": 4.7007042253521126e-06,
133
+ "loss": 0.5908,
134
+ "step": 900
135
+ },
136
+ {
137
+ "epoch": 0.63,
138
+ "eval_accuracy": 0.7444999814033508,
139
+ "eval_loss": 0.5935022234916687,
140
+ "eval_runtime": 22.0881,
141
+ "eval_samples_per_second": 90.546,
142
+ "eval_steps_per_second": 5.659,
143
+ "step": 900
144
+ },
145
+ {
146
+ "epoch": 0.7,
147
+ "learning_rate": 4.665492957746479e-06,
148
+ "loss": 0.5884,
149
+ "step": 1000
150
+ },
151
+ {
152
+ "epoch": 0.7,
153
+ "eval_accuracy": 0.7519999742507935,
154
+ "eval_loss": 0.5792337656021118,
155
+ "eval_runtime": 17.6606,
156
+ "eval_samples_per_second": 113.246,
157
+ "eval_steps_per_second": 7.078,
158
+ "step": 1000
159
+ },
160
+ {
161
+ "epoch": 0.77,
162
+ "learning_rate": 4.630281690140845e-06,
163
+ "loss": 0.5839,
164
+ "step": 1100
165
+ },
166
+ {
167
+ "epoch": 0.77,
168
+ "eval_accuracy": 0.7555000185966492,
169
+ "eval_loss": 0.578044056892395,
170
+ "eval_runtime": 15.9931,
171
+ "eval_samples_per_second": 125.054,
172
+ "eval_steps_per_second": 7.816,
173
+ "step": 1100
174
+ },
175
+ {
176
+ "epoch": 0.84,
177
+ "learning_rate": 4.595070422535211e-06,
178
+ "loss": 0.5772,
179
+ "step": 1200
180
+ },
181
+ {
182
+ "epoch": 0.84,
183
+ "eval_accuracy": 0.7570000290870667,
184
+ "eval_loss": 0.5727072954177856,
185
+ "eval_runtime": 21.5937,
186
+ "eval_samples_per_second": 92.62,
187
+ "eval_steps_per_second": 5.789,
188
+ "step": 1200
189
+ },
190
+ {
191
+ "epoch": 0.91,
192
+ "learning_rate": 4.559859154929578e-06,
193
+ "loss": 0.5895,
194
+ "step": 1300
195
+ },
196
+ {
197
+ "epoch": 0.91,
198
+ "eval_accuracy": 0.7549999952316284,
199
+ "eval_loss": 0.5601378679275513,
200
+ "eval_runtime": 20.1543,
201
+ "eval_samples_per_second": 99.234,
202
+ "eval_steps_per_second": 6.202,
203
+ "step": 1300
204
+ },
205
+ {
206
+ "epoch": 0.98,
207
+ "learning_rate": 4.524647887323944e-06,
208
+ "loss": 0.5757,
209
+ "step": 1400
210
+ },
211
+ {
212
+ "epoch": 0.98,
213
+ "eval_accuracy": 0.7524999976158142,
214
+ "eval_loss": 0.561326801776886,
215
+ "eval_runtime": 15.0299,
216
+ "eval_samples_per_second": 133.068,
217
+ "eval_steps_per_second": 8.317,
218
+ "step": 1400
219
+ },
220
+ {
221
+ "epoch": 1.05,
222
+ "learning_rate": 4.489436619718311e-06,
223
+ "loss": 0.5121,
224
+ "step": 1500
225
+ },
226
+ {
227
+ "epoch": 1.05,
228
+ "eval_accuracy": 0.7599999904632568,
229
+ "eval_loss": 0.5866703987121582,
230
+ "eval_runtime": 18.2735,
231
+ "eval_samples_per_second": 109.448,
232
+ "eval_steps_per_second": 6.841,
233
+ "step": 1500
234
+ },
235
+ {
236
+ "epoch": 1.12,
237
+ "learning_rate": 4.454225352112677e-06,
238
+ "loss": 0.5254,
239
+ "step": 1600
240
+ },
241
+ {
242
+ "epoch": 1.12,
243
+ "eval_accuracy": 0.7630000114440918,
244
+ "eval_loss": 0.5595362186431885,
245
+ "eval_runtime": 21.9878,
246
+ "eval_samples_per_second": 90.959,
247
+ "eval_steps_per_second": 5.685,
248
+ "step": 1600
249
+ },
250
+ {
251
+ "epoch": 1.19,
252
+ "learning_rate": 4.419014084507043e-06,
253
+ "loss": 0.5074,
254
+ "step": 1700
255
+ },
256
+ {
257
+ "epoch": 1.19,
258
+ "eval_accuracy": 0.7584999799728394,
259
+ "eval_loss": 0.559354841709137,
260
+ "eval_runtime": 19.9155,
261
+ "eval_samples_per_second": 100.424,
262
+ "eval_steps_per_second": 6.277,
263
+ "step": 1700
264
+ },
265
+ {
266
+ "epoch": 1.26,
267
+ "learning_rate": 4.383802816901409e-06,
268
+ "loss": 0.4947,
269
+ "step": 1800
270
+ },
271
+ {
272
+ "epoch": 1.26,
273
+ "eval_accuracy": 0.7574999928474426,
274
+ "eval_loss": 0.5696709156036377,
275
+ "eval_runtime": 22.8735,
276
+ "eval_samples_per_second": 87.438,
277
+ "eval_steps_per_second": 5.465,
278
+ "step": 1800
279
+ },
280
+ {
281
+ "epoch": 1.33,
282
+ "learning_rate": 4.3485915492957745e-06,
283
+ "loss": 0.5019,
284
+ "step": 1900
285
+ },
286
+ {
287
+ "epoch": 1.33,
288
+ "eval_accuracy": 0.7580000162124634,
289
+ "eval_loss": 0.5664528608322144,
290
+ "eval_runtime": 22.2002,
291
+ "eval_samples_per_second": 90.089,
292
+ "eval_steps_per_second": 5.631,
293
+ "step": 1900
294
+ },
295
+ {
296
+ "epoch": 1.4,
297
+ "learning_rate": 4.313380281690141e-06,
298
+ "loss": 0.5005,
299
+ "step": 2000
300
+ },
301
+ {
302
+ "epoch": 1.4,
303
+ "eval_accuracy": 0.765500009059906,
304
+ "eval_loss": 0.5484071969985962,
305
+ "eval_runtime": 21.9941,
306
+ "eval_samples_per_second": 90.934,
307
+ "eval_steps_per_second": 5.683,
308
+ "step": 2000
309
+ },
310
+ {
311
+ "epoch": 1.47,
312
+ "learning_rate": 4.278169014084507e-06,
313
+ "loss": 0.5125,
314
+ "step": 2100
315
+ },
316
+ {
317
+ "epoch": 1.47,
318
+ "eval_accuracy": 0.7605000138282776,
319
+ "eval_loss": 0.5626400709152222,
320
+ "eval_runtime": 17.7868,
321
+ "eval_samples_per_second": 112.443,
322
+ "eval_steps_per_second": 7.028,
323
+ "step": 2100
324
+ },
325
+ {
326
+ "epoch": 1.54,
327
+ "learning_rate": 4.242957746478873e-06,
328
+ "loss": 0.5241,
329
+ "step": 2200
330
+ },
331
+ {
332
+ "epoch": 1.54,
333
+ "eval_accuracy": 0.7559999823570251,
334
+ "eval_loss": 0.556066632270813,
335
+ "eval_runtime": 18.9121,
336
+ "eval_samples_per_second": 105.753,
337
+ "eval_steps_per_second": 6.61,
338
+ "step": 2200
339
+ },
340
+ {
341
+ "epoch": 1.61,
342
+ "learning_rate": 4.20774647887324e-06,
343
+ "loss": 0.5198,
344
+ "step": 2300
345
+ },
346
+ {
347
+ "epoch": 1.61,
348
+ "eval_accuracy": 0.7599999904632568,
349
+ "eval_loss": 0.560243546962738,
350
+ "eval_runtime": 20.0908,
351
+ "eval_samples_per_second": 99.548,
352
+ "eval_steps_per_second": 6.222,
353
+ "step": 2300
354
+ },
355
+ {
356
+ "epoch": 1.68,
357
+ "learning_rate": 4.172535211267606e-06,
358
+ "loss": 0.5124,
359
+ "step": 2400
360
+ },
361
+ {
362
+ "epoch": 1.68,
363
+ "eval_accuracy": 0.7490000128746033,
364
+ "eval_loss": 0.5654177665710449,
365
+ "eval_runtime": 19.7883,
366
+ "eval_samples_per_second": 101.07,
367
+ "eval_steps_per_second": 6.317,
368
+ "step": 2400
369
+ },
370
+ {
371
+ "epoch": 1.75,
372
+ "learning_rate": 4.137323943661972e-06,
373
+ "loss": 0.5096,
374
+ "step": 2500
375
+ },
376
+ {
377
+ "epoch": 1.75,
378
+ "eval_accuracy": 0.7515000104904175,
379
+ "eval_loss": 0.5803455710411072,
380
+ "eval_runtime": 22.0507,
381
+ "eval_samples_per_second": 90.7,
382
+ "eval_steps_per_second": 5.669,
383
+ "step": 2500
384
+ },
385
+ {
386
+ "epoch": 1.82,
387
+ "learning_rate": 4.102112676056339e-06,
388
+ "loss": 0.4885,
389
+ "step": 2600
390
+ },
391
+ {
392
+ "epoch": 1.82,
393
+ "eval_accuracy": 0.75,
394
+ "eval_loss": 0.5889333486557007,
395
+ "eval_runtime": 21.1933,
396
+ "eval_samples_per_second": 94.369,
397
+ "eval_steps_per_second": 5.898,
398
+ "step": 2600
399
+ },
400
+ {
401
+ "epoch": 1.89,
402
+ "learning_rate": 4.0669014084507045e-06,
403
+ "loss": 0.5111,
404
+ "step": 2700
405
+ },
406
+ {
407
+ "epoch": 1.89,
408
+ "eval_accuracy": 0.7664999961853027,
409
+ "eval_loss": 0.5507832169532776,
410
+ "eval_runtime": 20.2761,
411
+ "eval_samples_per_second": 98.638,
412
+ "eval_steps_per_second": 6.165,
413
+ "step": 2700
414
+ },
415
+ {
416
+ "epoch": 1.96,
417
+ "learning_rate": 4.031690140845071e-06,
418
+ "loss": 0.4868,
419
+ "step": 2800
420
+ },
421
+ {
422
+ "epoch": 1.96,
423
+ "eval_accuracy": 0.7634999752044678,
424
+ "eval_loss": 0.5621495842933655,
425
+ "eval_runtime": 18.2341,
426
+ "eval_samples_per_second": 109.685,
427
+ "eval_steps_per_second": 6.855,
428
+ "step": 2800
429
+ },
430
+ {
431
+ "epoch": 2.04,
432
+ "learning_rate": 3.996478873239437e-06,
433
+ "loss": 0.4599,
434
+ "step": 2900
435
+ },
436
+ {
437
+ "epoch": 2.04,
438
+ "eval_accuracy": 0.7615000009536743,
439
+ "eval_loss": 0.5994852185249329,
440
+ "eval_runtime": 17.787,
441
+ "eval_samples_per_second": 112.442,
442
+ "eval_steps_per_second": 7.028,
443
+ "step": 2900
444
+ },
445
+ {
446
+ "epoch": 2.11,
447
+ "learning_rate": 3.961267605633803e-06,
448
+ "loss": 0.4147,
449
+ "step": 3000
450
+ },
451
+ {
452
+ "epoch": 2.11,
453
+ "eval_accuracy": 0.753000020980835,
454
+ "eval_loss": 0.6202083230018616,
455
+ "eval_runtime": 20.5139,
456
+ "eval_samples_per_second": 97.495,
457
+ "eval_steps_per_second": 6.093,
458
+ "step": 3000
459
+ },
460
+ {
461
+ "epoch": 2.18,
462
+ "learning_rate": 3.926056338028169e-06,
463
+ "loss": 0.4233,
464
+ "step": 3100
465
+ },
466
+ {
467
+ "epoch": 2.18,
468
+ "eval_accuracy": 0.762499988079071,
469
+ "eval_loss": 0.5875486135482788,
470
+ "eval_runtime": 19.4349,
471
+ "eval_samples_per_second": 102.908,
472
+ "eval_steps_per_second": 6.432,
473
+ "step": 3100
474
+ },
475
+ {
476
+ "epoch": 2.25,
477
+ "learning_rate": 3.890845070422535e-06,
478
+ "loss": 0.4324,
479
+ "step": 3200
480
+ },
481
+ {
482
+ "epoch": 2.25,
483
+ "eval_accuracy": 0.7609999775886536,
484
+ "eval_loss": 0.5794370174407959,
485
+ "eval_runtime": 18.3807,
486
+ "eval_samples_per_second": 108.81,
487
+ "eval_steps_per_second": 6.801,
488
+ "step": 3200
489
+ },
490
+ {
491
+ "epoch": 2.32,
492
+ "learning_rate": 3.855633802816902e-06,
493
+ "loss": 0.4141,
494
+ "step": 3300
495
+ },
496
+ {
497
+ "epoch": 2.32,
498
+ "eval_accuracy": 0.7459999918937683,
499
+ "eval_loss": 0.5901930928230286,
500
+ "eval_runtime": 20.1122,
501
+ "eval_samples_per_second": 99.442,
502
+ "eval_steps_per_second": 6.215,
503
+ "step": 3300
504
+ },
505
+ {
506
+ "epoch": 2.39,
507
+ "learning_rate": 3.820422535211268e-06,
508
+ "loss": 0.4306,
509
+ "step": 3400
510
+ },
511
+ {
512
+ "epoch": 2.39,
513
+ "eval_accuracy": 0.7544999718666077,
514
+ "eval_loss": 0.6053192019462585,
515
+ "eval_runtime": 20.0091,
516
+ "eval_samples_per_second": 99.955,
517
+ "eval_steps_per_second": 6.247,
518
+ "step": 3400
519
+ },
520
+ {
521
+ "epoch": 2.46,
522
+ "learning_rate": 3.785211267605634e-06,
523
+ "loss": 0.4266,
524
+ "step": 3500
525
+ },
526
+ {
527
+ "epoch": 2.46,
528
+ "eval_accuracy": 0.7570000290870667,
529
+ "eval_loss": 0.5978769659996033,
530
+ "eval_runtime": 19.8772,
531
+ "eval_samples_per_second": 100.618,
532
+ "eval_steps_per_second": 6.289,
533
+ "step": 3500
534
+ },
535
+ {
536
+ "epoch": 2.53,
537
+ "learning_rate": 3.7500000000000005e-06,
538
+ "loss": 0.4227,
539
+ "step": 3600
540
+ },
541
+ {
542
+ "epoch": 2.53,
543
+ "eval_accuracy": 0.7649999856948853,
544
+ "eval_loss": 0.5919951796531677,
545
+ "eval_runtime": 21.1736,
546
+ "eval_samples_per_second": 94.457,
547
+ "eval_steps_per_second": 5.904,
548
+ "step": 3600
549
+ },
550
+ {
551
+ "epoch": 2.6,
552
+ "learning_rate": 3.7147887323943665e-06,
553
+ "loss": 0.4226,
554
+ "step": 3700
555
+ },
556
+ {
557
+ "epoch": 2.6,
558
+ "eval_accuracy": 0.7455000281333923,
559
+ "eval_loss": 0.6165611743927002,
560
+ "eval_runtime": 20.5872,
561
+ "eval_samples_per_second": 97.148,
562
+ "eval_steps_per_second": 6.072,
563
+ "step": 3700
564
+ },
565
+ {
566
+ "epoch": 2.67,
567
+ "learning_rate": 3.679577464788733e-06,
568
+ "loss": 0.3978,
569
+ "step": 3800
570
+ },
571
+ {
572
+ "epoch": 2.67,
573
+ "eval_accuracy": 0.7559999823570251,
574
+ "eval_loss": 0.6125866770744324,
575
+ "eval_runtime": 22.1514,
576
+ "eval_samples_per_second": 90.288,
577
+ "eval_steps_per_second": 5.643,
578
+ "step": 3800
579
+ },
580
+ {
581
+ "epoch": 2.74,
582
+ "learning_rate": 3.644366197183099e-06,
583
+ "loss": 0.3954,
584
+ "step": 3900
585
+ },
586
+ {
587
+ "epoch": 2.74,
588
+ "eval_accuracy": 0.7549999952316284,
589
+ "eval_loss": 0.615158200263977,
590
+ "eval_runtime": 17.1074,
591
+ "eval_samples_per_second": 116.908,
592
+ "eval_steps_per_second": 7.307,
593
+ "step": 3900
594
+ },
595
+ {
596
+ "epoch": 2.81,
597
+ "learning_rate": 3.609154929577465e-06,
598
+ "loss": 0.4209,
599
+ "step": 4000
600
+ },
601
+ {
602
+ "epoch": 2.81,
603
+ "eval_accuracy": 0.75,
604
+ "eval_loss": 0.597953736782074,
605
+ "eval_runtime": 22.6397,
606
+ "eval_samples_per_second": 88.34,
607
+ "eval_steps_per_second": 5.521,
608
+ "step": 4000
609
+ },
610
+ {
611
+ "epoch": 2.88,
612
+ "learning_rate": 3.5739436619718315e-06,
613
+ "loss": 0.3982,
614
+ "step": 4100
615
+ },
616
+ {
617
+ "epoch": 2.88,
618
+ "eval_accuracy": 0.7490000128746033,
619
+ "eval_loss": 0.6096097230911255,
620
+ "eval_runtime": 22.0097,
621
+ "eval_samples_per_second": 90.869,
622
+ "eval_steps_per_second": 5.679,
623
+ "step": 4100
624
+ },
625
+ {
626
+ "epoch": 2.95,
627
+ "learning_rate": 3.538732394366197e-06,
628
+ "loss": 0.4016,
629
+ "step": 4200
630
+ },
631
+ {
632
+ "epoch": 2.95,
633
+ "eval_accuracy": 0.7425000071525574,
634
+ "eval_loss": 0.6540722846984863,
635
+ "eval_runtime": 16.1358,
636
+ "eval_samples_per_second": 123.948,
637
+ "eval_steps_per_second": 7.747,
638
+ "step": 4200
639
+ },
640
+ {
641
+ "epoch": 3.02,
642
+ "learning_rate": 3.5035211267605634e-06,
643
+ "loss": 0.3966,
644
+ "step": 4300
645
+ },
646
+ {
647
+ "epoch": 3.02,
648
+ "eval_accuracy": 0.7544999718666077,
649
+ "eval_loss": 0.6377372145652771,
650
+ "eval_runtime": 20.4003,
651
+ "eval_samples_per_second": 98.038,
652
+ "eval_steps_per_second": 6.127,
653
+ "step": 4300
654
+ },
655
+ {
656
+ "epoch": 3.09,
657
+ "learning_rate": 3.4683098591549297e-06,
658
+ "loss": 0.3074,
659
+ "step": 4400
660
+ },
661
+ {
662
+ "epoch": 3.09,
663
+ "eval_accuracy": 0.75,
664
+ "eval_loss": 0.6859884262084961,
665
+ "eval_runtime": 20.6366,
666
+ "eval_samples_per_second": 96.915,
667
+ "eval_steps_per_second": 6.057,
668
+ "step": 4400
669
+ },
670
+ {
671
+ "epoch": 3.16,
672
+ "learning_rate": 3.433098591549296e-06,
673
+ "loss": 0.3551,
674
+ "step": 4500
675
+ },
676
+ {
677
+ "epoch": 3.16,
678
+ "eval_accuracy": 0.7549999952316284,
679
+ "eval_loss": 0.6160025596618652,
680
+ "eval_runtime": 21.368,
681
+ "eval_samples_per_second": 93.598,
682
+ "eval_steps_per_second": 5.85,
683
+ "step": 4500
684
+ },
685
+ {
686
+ "epoch": 3.23,
687
+ "learning_rate": 3.397887323943662e-06,
688
+ "loss": 0.3323,
689
+ "step": 4600
690
+ },
691
+ {
692
+ "epoch": 3.23,
693
+ "eval_accuracy": 0.7519999742507935,
694
+ "eval_loss": 0.6714155077934265,
695
+ "eval_runtime": 21.1568,
696
+ "eval_samples_per_second": 94.532,
697
+ "eval_steps_per_second": 5.908,
698
+ "step": 4600
699
+ },
700
+ {
701
+ "epoch": 3.3,
702
+ "learning_rate": 3.3626760563380284e-06,
703
+ "loss": 0.3171,
704
+ "step": 4700
705
+ },
706
+ {
707
+ "epoch": 3.3,
708
+ "eval_accuracy": 0.7534999847412109,
709
+ "eval_loss": 0.6537904739379883,
710
+ "eval_runtime": 17.8954,
711
+ "eval_samples_per_second": 111.76,
712
+ "eval_steps_per_second": 6.985,
713
+ "step": 4700
714
+ },
715
+ {
716
+ "epoch": 3.37,
717
+ "learning_rate": 3.3274647887323947e-06,
718
+ "loss": 0.3403,
719
+ "step": 4800
720
+ },
721
+ {
722
+ "epoch": 3.37,
723
+ "eval_accuracy": 0.7465000152587891,
724
+ "eval_loss": 0.677370548248291,
725
+ "eval_runtime": 22.2854,
726
+ "eval_samples_per_second": 89.745,
727
+ "eval_steps_per_second": 5.609,
728
+ "step": 4800
729
+ },
730
+ {
731
+ "epoch": 3.44,
732
+ "learning_rate": 3.292253521126761e-06,
733
+ "loss": 0.3396,
734
+ "step": 4900
735
+ },
736
+ {
737
+ "epoch": 3.44,
738
+ "eval_accuracy": 0.7465000152587891,
739
+ "eval_loss": 0.6725812554359436,
740
+ "eval_runtime": 20.4072,
741
+ "eval_samples_per_second": 98.005,
742
+ "eval_steps_per_second": 6.125,
743
+ "step": 4900
744
+ },
745
+ {
746
+ "epoch": 3.51,
747
+ "learning_rate": 3.257042253521127e-06,
748
+ "loss": 0.3259,
749
+ "step": 5000
750
+ },
751
+ {
752
+ "epoch": 3.51,
753
+ "eval_accuracy": 0.7480000257492065,
754
+ "eval_loss": 0.6465049982070923,
755
+ "eval_runtime": 16.3957,
756
+ "eval_samples_per_second": 121.983,
757
+ "eval_steps_per_second": 7.624,
758
+ "step": 5000
759
+ },
760
+ {
761
+ "epoch": 3.58,
762
+ "learning_rate": 3.2218309859154934e-06,
763
+ "loss": 0.3392,
764
+ "step": 5100
765
+ },
766
+ {
767
+ "epoch": 3.58,
768
+ "eval_accuracy": 0.7459999918937683,
769
+ "eval_loss": 0.6860352754592896,
770
+ "eval_runtime": 19.1797,
771
+ "eval_samples_per_second": 104.277,
772
+ "eval_steps_per_second": 6.517,
773
+ "step": 5100
774
+ },
775
+ {
776
+ "epoch": 3.65,
777
+ "learning_rate": 3.1866197183098598e-06,
778
+ "loss": 0.3251,
779
+ "step": 5200
780
+ },
781
+ {
782
+ "epoch": 3.65,
783
+ "eval_accuracy": 0.7494999766349792,
784
+ "eval_loss": 0.6696720123291016,
785
+ "eval_runtime": 17.883,
786
+ "eval_samples_per_second": 111.838,
787
+ "eval_steps_per_second": 6.99,
788
+ "step": 5200
789
+ },
790
+ {
791
+ "epoch": 3.72,
792
+ "learning_rate": 3.1514084507042257e-06,
793
+ "loss": 0.3253,
794
+ "step": 5300
795
+ },
796
+ {
797
+ "epoch": 3.72,
798
+ "eval_accuracy": 0.7429999709129333,
799
+ "eval_loss": 0.6769505739212036,
800
+ "eval_runtime": 19.8341,
801
+ "eval_samples_per_second": 100.836,
802
+ "eval_steps_per_second": 6.302,
803
+ "step": 5300
804
+ },
805
+ {
806
+ "epoch": 3.79,
807
+ "learning_rate": 3.1161971830985916e-06,
808
+ "loss": 0.3455,
809
+ "step": 5400
810
+ },
811
+ {
812
+ "epoch": 3.79,
813
+ "eval_accuracy": 0.7360000014305115,
814
+ "eval_loss": 0.7176979780197144,
815
+ "eval_runtime": 17.9226,
816
+ "eval_samples_per_second": 111.591,
817
+ "eval_steps_per_second": 6.974,
818
+ "step": 5400
819
+ },
820
+ {
821
+ "epoch": 3.86,
822
+ "learning_rate": 3.0809859154929576e-06,
823
+ "loss": 0.3323,
824
+ "step": 5500
825
+ },
826
+ {
827
+ "epoch": 3.86,
828
+ "eval_accuracy": 0.7400000095367432,
829
+ "eval_loss": 0.6943067908287048,
830
+ "eval_runtime": 21.4155,
831
+ "eval_samples_per_second": 93.39,
832
+ "eval_steps_per_second": 5.837,
833
+ "step": 5500
834
+ },
835
+ {
836
+ "epoch": 3.93,
837
+ "learning_rate": 3.045774647887324e-06,
838
+ "loss": 0.3335,
839
+ "step": 5600
840
+ },
841
+ {
842
+ "epoch": 3.93,
843
+ "eval_accuracy": 0.7555000185966492,
844
+ "eval_loss": 0.6506811380386353,
845
+ "eval_runtime": 21.6099,
846
+ "eval_samples_per_second": 92.55,
847
+ "eval_steps_per_second": 5.784,
848
+ "step": 5600
849
+ },
850
+ {
851
+ "epoch": 4.0,
852
+ "learning_rate": 3.0105633802816903e-06,
853
+ "loss": 0.3368,
854
+ "step": 5700
855
+ },
856
+ {
857
+ "epoch": 4.0,
858
+ "eval_accuracy": 0.7484999895095825,
859
+ "eval_loss": 0.6580154895782471,
860
+ "eval_runtime": 21.7515,
861
+ "eval_samples_per_second": 91.948,
862
+ "eval_steps_per_second": 5.747,
863
+ "step": 5700
864
+ },
865
+ {
866
+ "epoch": 4.07,
867
+ "learning_rate": 2.9753521126760567e-06,
868
+ "loss": 0.2479,
869
+ "step": 5800
870
+ },
871
+ {
872
+ "epoch": 4.07,
873
+ "eval_accuracy": 0.7429999709129333,
874
+ "eval_loss": 0.7666531801223755,
875
+ "eval_runtime": 15.994,
876
+ "eval_samples_per_second": 125.047,
877
+ "eval_steps_per_second": 7.815,
878
+ "step": 5800
879
+ },
880
+ {
881
+ "epoch": 4.14,
882
+ "learning_rate": 2.9401408450704226e-06,
883
+ "loss": 0.2613,
884
+ "step": 5900
885
+ },
886
+ {
887
+ "epoch": 4.14,
888
+ "eval_accuracy": 0.7505000233650208,
889
+ "eval_loss": 0.751265823841095,
890
+ "eval_runtime": 16.5258,
891
+ "eval_samples_per_second": 121.023,
892
+ "eval_steps_per_second": 7.564,
893
+ "step": 5900
894
+ },
895
+ {
896
+ "epoch": 4.21,
897
+ "learning_rate": 2.904929577464789e-06,
898
+ "loss": 0.2557,
899
+ "step": 6000
900
+ },
901
+ {
902
+ "epoch": 4.21,
903
+ "eval_accuracy": 0.7484999895095825,
904
+ "eval_loss": 0.7926999926567078,
905
+ "eval_runtime": 20.2799,
906
+ "eval_samples_per_second": 98.62,
907
+ "eval_steps_per_second": 6.164,
908
+ "step": 6000
909
+ },
910
+ {
911
+ "epoch": 4.28,
912
+ "learning_rate": 2.8697183098591553e-06,
913
+ "loss": 0.243,
914
+ "step": 6100
915
+ },
916
+ {
917
+ "epoch": 4.28,
918
+ "eval_accuracy": 0.7450000047683716,
919
+ "eval_loss": 0.77916020154953,
920
+ "eval_runtime": 19.6554,
921
+ "eval_samples_per_second": 101.753,
922
+ "eval_steps_per_second": 6.36,
923
+ "step": 6100
924
+ },
925
+ {
926
+ "epoch": 4.35,
927
+ "learning_rate": 2.8345070422535217e-06,
928
+ "loss": 0.2473,
929
+ "step": 6200
930
+ },
931
+ {
932
+ "epoch": 4.35,
933
+ "eval_accuracy": 0.7354999780654907,
934
+ "eval_loss": 0.8106710314750671,
935
+ "eval_runtime": 14.7909,
936
+ "eval_samples_per_second": 135.218,
937
+ "eval_steps_per_second": 8.451,
938
+ "step": 6200
939
+ },
940
+ {
941
+ "epoch": 4.42,
942
+ "learning_rate": 2.7992957746478876e-06,
943
+ "loss": 0.2447,
944
+ "step": 6300
945
+ },
946
+ {
947
+ "epoch": 4.42,
948
+ "eval_accuracy": 0.7369999885559082,
949
+ "eval_loss": 0.7850819230079651,
950
+ "eval_runtime": 15.145,
951
+ "eval_samples_per_second": 132.057,
952
+ "eval_steps_per_second": 8.254,
953
+ "step": 6300
954
+ },
955
+ {
956
+ "epoch": 4.49,
957
+ "learning_rate": 2.764084507042254e-06,
958
+ "loss": 0.2515,
959
+ "step": 6400
960
+ },
961
+ {
962
+ "epoch": 4.49,
963
+ "eval_accuracy": 0.7465000152587891,
964
+ "eval_loss": 0.7529160380363464,
965
+ "eval_runtime": 14.1294,
966
+ "eval_samples_per_second": 141.549,
967
+ "eval_steps_per_second": 8.847,
968
+ "step": 6400
969
+ },
970
+ {
971
+ "epoch": 4.56,
972
+ "learning_rate": 2.7288732394366203e-06,
973
+ "loss": 0.274,
974
+ "step": 6500
975
+ },
976
+ {
977
+ "epoch": 4.56,
978
+ "eval_accuracy": 0.7465000152587891,
979
+ "eval_loss": 0.7389978170394897,
980
+ "eval_runtime": 18.3936,
981
+ "eval_samples_per_second": 108.733,
982
+ "eval_steps_per_second": 6.796,
983
+ "step": 6500
984
+ },
985
+ {
986
+ "epoch": 4.63,
987
+ "learning_rate": 2.693661971830986e-06,
988
+ "loss": 0.2674,
989
+ "step": 6600
990
+ },
991
+ {
992
+ "epoch": 4.63,
993
+ "eval_accuracy": 0.7459999918937683,
994
+ "eval_loss": 0.7657651305198669,
995
+ "eval_runtime": 19.2395,
996
+ "eval_samples_per_second": 103.953,
997
+ "eval_steps_per_second": 6.497,
998
+ "step": 6600
999
+ },
1000
+ {
1001
+ "epoch": 4.7,
1002
+ "learning_rate": 2.6584507042253522e-06,
1003
+ "loss": 0.2416,
1004
+ "step": 6700
1005
+ },
1006
+ {
1007
+ "epoch": 4.7,
1008
+ "eval_accuracy": 0.7484999895095825,
1009
+ "eval_loss": 0.7914510369300842,
1010
+ "eval_runtime": 17.7833,
1011
+ "eval_samples_per_second": 112.465,
1012
+ "eval_steps_per_second": 7.029,
1013
+ "step": 6700
1014
+ },
1015
+ {
1016
+ "epoch": 4.77,
1017
+ "learning_rate": 2.623239436619718e-06,
1018
+ "loss": 0.2432,
1019
+ "step": 6800
1020
+ },
1021
+ {
1022
+ "epoch": 4.77,
1023
+ "eval_accuracy": 0.7434999942779541,
1024
+ "eval_loss": 0.7988595962524414,
1025
+ "eval_runtime": 16.8516,
1026
+ "eval_samples_per_second": 118.683,
1027
+ "eval_steps_per_second": 7.418,
1028
+ "step": 6800
1029
+ },
1030
+ {
1031
+ "epoch": 4.84,
1032
+ "learning_rate": 2.5880281690140845e-06,
1033
+ "loss": 0.2595,
1034
+ "step": 6900
1035
+ },
1036
+ {
1037
+ "epoch": 4.84,
1038
+ "eval_accuracy": 0.7379999756813049,
1039
+ "eval_loss": 0.7850367426872253,
1040
+ "eval_runtime": 22.0351,
1041
+ "eval_samples_per_second": 90.764,
1042
+ "eval_steps_per_second": 5.673,
1043
+ "step": 6900
1044
+ },
1045
+ {
1046
+ "epoch": 4.91,
1047
+ "learning_rate": 2.552816901408451e-06,
1048
+ "loss": 0.2736,
1049
+ "step": 7000
1050
+ },
1051
+ {
1052
+ "epoch": 4.91,
1053
+ "eval_accuracy": 0.7394999861717224,
1054
+ "eval_loss": 0.7577053308486938,
1055
+ "eval_runtime": 22.4529,
1056
+ "eval_samples_per_second": 89.075,
1057
+ "eval_steps_per_second": 5.567,
1058
+ "step": 7000
1059
+ },
1060
+ {
1061
+ "epoch": 4.98,
1062
+ "learning_rate": 2.5176056338028172e-06,
1063
+ "loss": 0.2783,
1064
+ "step": 7100
1065
+ },
1066
+ {
1067
+ "epoch": 4.98,
1068
+ "eval_accuracy": 0.7404999732971191,
1069
+ "eval_loss": 0.7649760842323303,
1070
+ "eval_runtime": 18.1063,
1071
+ "eval_samples_per_second": 110.459,
1072
+ "eval_steps_per_second": 6.904,
1073
+ "step": 7100
1074
+ },
1075
+ {
1076
+ "epoch": 5.05,
1077
+ "learning_rate": 2.482394366197183e-06,
1078
+ "loss": 0.2304,
1079
+ "step": 7200
1080
+ },
1081
+ {
1082
+ "epoch": 5.05,
1083
+ "eval_accuracy": 0.7384999990463257,
1084
+ "eval_loss": 0.8541684746742249,
1085
+ "eval_runtime": 20.2711,
1086
+ "eval_samples_per_second": 98.663,
1087
+ "eval_steps_per_second": 6.166,
1088
+ "step": 7200
1089
+ },
1090
+ {
1091
+ "epoch": 5.12,
1092
+ "learning_rate": 2.4471830985915495e-06,
1093
+ "loss": 0.1937,
1094
+ "step": 7300
1095
+ },
1096
+ {
1097
+ "epoch": 5.12,
1098
+ "eval_accuracy": 0.734499990940094,
1099
+ "eval_loss": 0.8389941453933716,
1100
+ "eval_runtime": 21.6926,
1101
+ "eval_samples_per_second": 92.197,
1102
+ "eval_steps_per_second": 5.762,
1103
+ "step": 7300
1104
+ },
1105
+ {
1106
+ "epoch": 5.19,
1107
+ "learning_rate": 2.411971830985916e-06,
1108
+ "loss": 0.1878,
1109
+ "step": 7400
1110
+ },
1111
+ {
1112
+ "epoch": 5.19,
1113
+ "eval_accuracy": 0.7329999804496765,
1114
+ "eval_loss": 0.9149684906005859,
1115
+ "eval_runtime": 19.4205,
1116
+ "eval_samples_per_second": 102.984,
1117
+ "eval_steps_per_second": 6.436,
1118
+ "step": 7400
1119
+ },
1120
+ {
1121
+ "epoch": 5.26,
1122
+ "learning_rate": 2.376760563380282e-06,
1123
+ "loss": 0.1921,
1124
+ "step": 7500
1125
+ },
1126
+ {
1127
+ "epoch": 5.26,
1128
+ "eval_accuracy": 0.7404999732971191,
1129
+ "eval_loss": 0.8792451024055481,
1130
+ "eval_runtime": 20.264,
1131
+ "eval_samples_per_second": 98.697,
1132
+ "eval_steps_per_second": 6.169,
1133
+ "step": 7500
1134
+ },
1135
+ {
1136
+ "epoch": 5.33,
1137
+ "learning_rate": 2.341549295774648e-06,
1138
+ "loss": 0.1916,
1139
+ "step": 7600
1140
+ },
1141
+ {
1142
+ "epoch": 5.33,
1143
+ "eval_accuracy": 0.7409999966621399,
1144
+ "eval_loss": 0.8891890645027161,
1145
+ "eval_runtime": 20.8102,
1146
+ "eval_samples_per_second": 96.107,
1147
+ "eval_steps_per_second": 6.007,
1148
+ "step": 7600
1149
+ },
1150
+ {
1151
+ "epoch": 5.4,
1152
+ "learning_rate": 2.306338028169014e-06,
1153
+ "loss": 0.2011,
1154
+ "step": 7700
1155
+ },
1156
+ {
1157
+ "epoch": 5.4,
1158
+ "eval_accuracy": 0.7325000166893005,
1159
+ "eval_loss": 0.9012252688407898,
1160
+ "eval_runtime": 21.4132,
1161
+ "eval_samples_per_second": 93.4,
1162
+ "eval_steps_per_second": 5.838,
1163
+ "step": 7700
1164
+ },
1165
+ {
1166
+ "epoch": 5.47,
1167
+ "learning_rate": 2.2711267605633805e-06,
1168
+ "loss": 0.211,
1169
+ "step": 7800
1170
+ },
1171
+ {
1172
+ "epoch": 5.47,
1173
+ "eval_accuracy": 0.7419999837875366,
1174
+ "eval_loss": 0.8607960343360901,
1175
+ "eval_runtime": 17.8954,
1176
+ "eval_samples_per_second": 111.761,
1177
+ "eval_steps_per_second": 6.985,
1178
+ "step": 7800
1179
+ },
1180
+ {
1181
+ "epoch": 5.54,
1182
+ "learning_rate": 2.235915492957747e-06,
1183
+ "loss": 0.2194,
1184
+ "step": 7900
1185
+ },
1186
+ {
1187
+ "epoch": 5.54,
1188
+ "eval_accuracy": 0.7319999933242798,
1189
+ "eval_loss": 0.8851566314697266,
1190
+ "eval_runtime": 12.4785,
1191
+ "eval_samples_per_second": 160.275,
1192
+ "eval_steps_per_second": 10.017,
1193
+ "step": 7900
1194
+ },
1195
+ {
1196
+ "epoch": 5.61,
1197
+ "learning_rate": 2.200704225352113e-06,
1198
+ "loss": 0.205,
1199
+ "step": 8000
1200
+ },
1201
+ {
1202
+ "epoch": 5.61,
1203
+ "eval_accuracy": 0.7384999990463257,
1204
+ "eval_loss": 0.8803377151489258,
1205
+ "eval_runtime": 17.7919,
1206
+ "eval_samples_per_second": 112.41,
1207
+ "eval_steps_per_second": 7.026,
1208
+ "step": 8000
1209
+ },
1210
+ {
1211
+ "epoch": 5.68,
1212
+ "learning_rate": 2.1654929577464787e-06,
1213
+ "loss": 0.1981,
1214
+ "step": 8100
1215
+ },
1216
+ {
1217
+ "epoch": 5.68,
1218
+ "eval_accuracy": 0.7329999804496765,
1219
+ "eval_loss": 0.86810302734375,
1220
+ "eval_runtime": 15.0816,
1221
+ "eval_samples_per_second": 132.612,
1222
+ "eval_steps_per_second": 8.288,
1223
+ "step": 8100
1224
+ },
1225
+ {
1226
+ "epoch": 5.75,
1227
+ "learning_rate": 2.130281690140845e-06,
1228
+ "loss": 0.1908,
1229
+ "step": 8200
1230
+ },
1231
+ {
1232
+ "epoch": 5.75,
1233
+ "eval_accuracy": 0.7434999942779541,
1234
+ "eval_loss": 0.9019960761070251,
1235
+ "eval_runtime": 16.0702,
1236
+ "eval_samples_per_second": 124.454,
1237
+ "eval_steps_per_second": 7.778,
1238
+ "step": 8200
1239
+ },
1240
+ {
1241
+ "epoch": 5.82,
1242
+ "learning_rate": 2.0950704225352115e-06,
1243
+ "loss": 0.1942,
1244
+ "step": 8300
1245
+ },
1246
+ {
1247
+ "epoch": 5.82,
1248
+ "eval_accuracy": 0.7409999966621399,
1249
+ "eval_loss": 0.8780096173286438,
1250
+ "eval_runtime": 18.9162,
1251
+ "eval_samples_per_second": 105.73,
1252
+ "eval_steps_per_second": 6.608,
1253
+ "step": 8300
1254
+ },
1255
+ {
1256
+ "epoch": 5.89,
1257
+ "learning_rate": 2.059859154929578e-06,
1258
+ "loss": 0.1958,
1259
+ "step": 8400
1260
+ },
1261
+ {
1262
+ "epoch": 5.89,
1263
+ "eval_accuracy": 0.734499990940094,
1264
+ "eval_loss": 0.8936640620231628,
1265
+ "eval_runtime": 19.3712,
1266
+ "eval_samples_per_second": 103.246,
1267
+ "eval_steps_per_second": 6.453,
1268
+ "step": 8400
1269
+ },
1270
+ {
1271
+ "epoch": 5.96,
1272
+ "learning_rate": 2.0246478873239438e-06,
1273
+ "loss": 0.1883,
1274
+ "step": 8500
1275
+ },
1276
+ {
1277
+ "epoch": 5.96,
1278
+ "eval_accuracy": 0.7360000014305115,
1279
+ "eval_loss": 0.9120668172836304,
1280
+ "eval_runtime": 17.1206,
1281
+ "eval_samples_per_second": 116.818,
1282
+ "eval_steps_per_second": 7.301,
1283
+ "step": 8500
1284
+ },
1285
+ {
1286
+ "epoch": 6.04,
1287
+ "learning_rate": 1.98943661971831e-06,
1288
+ "loss": 0.1819,
1289
+ "step": 8600
1290
+ },
1291
+ {
1292
+ "epoch": 6.04,
1293
+ "eval_accuracy": 0.7429999709129333,
1294
+ "eval_loss": 0.94089674949646,
1295
+ "eval_runtime": 22.6258,
1296
+ "eval_samples_per_second": 88.395,
1297
+ "eval_steps_per_second": 5.525,
1298
+ "step": 8600
1299
+ },
1300
+ {
1301
+ "epoch": 6.11,
1302
+ "learning_rate": 1.954225352112676e-06,
1303
+ "loss": 0.145,
1304
+ "step": 8700
1305
+ },
1306
+ {
1307
+ "epoch": 6.11,
1308
+ "eval_accuracy": 0.7264999747276306,
1309
+ "eval_loss": 1.1389663219451904,
1310
+ "eval_runtime": 15.9579,
1311
+ "eval_samples_per_second": 125.33,
1312
+ "eval_steps_per_second": 7.833,
1313
+ "step": 8700
1314
+ },
1315
+ {
1316
+ "epoch": 6.18,
1317
+ "learning_rate": 1.9190140845070424e-06,
1318
+ "loss": 0.1696,
1319
+ "step": 8800
1320
+ },
1321
+ {
1322
+ "epoch": 6.18,
1323
+ "eval_accuracy": 0.7429999709129333,
1324
+ "eval_loss": 0.9188500046730042,
1325
+ "eval_runtime": 21.2758,
1326
+ "eval_samples_per_second": 94.003,
1327
+ "eval_steps_per_second": 5.875,
1328
+ "step": 8800
1329
+ },
1330
+ {
1331
+ "epoch": 6.25,
1332
+ "learning_rate": 1.8838028169014086e-06,
1333
+ "loss": 0.1488,
1334
+ "step": 8900
1335
+ },
1336
+ {
1337
+ "epoch": 6.25,
1338
+ "eval_accuracy": 0.7400000095367432,
1339
+ "eval_loss": 0.9717501401901245,
1340
+ "eval_runtime": 17.3967,
1341
+ "eval_samples_per_second": 114.964,
1342
+ "eval_steps_per_second": 7.185,
1343
+ "step": 8900
1344
+ },
1345
+ {
1346
+ "epoch": 6.32,
1347
+ "learning_rate": 1.848591549295775e-06,
1348
+ "loss": 0.1637,
1349
+ "step": 9000
1350
+ },
1351
+ {
1352
+ "epoch": 6.32,
1353
+ "eval_accuracy": 0.7450000047683716,
1354
+ "eval_loss": 0.9701842069625854,
1355
+ "eval_runtime": 19.9143,
1356
+ "eval_samples_per_second": 100.43,
1357
+ "eval_steps_per_second": 6.277,
1358
+ "step": 9000
1359
+ },
1360
+ {
1361
+ "epoch": 6.39,
1362
+ "learning_rate": 1.813380281690141e-06,
1363
+ "loss": 0.1547,
1364
+ "step": 9100
1365
+ },
1366
+ {
1367
+ "epoch": 6.39,
1368
+ "eval_accuracy": 0.7409999966621399,
1369
+ "eval_loss": 1.0032541751861572,
1370
+ "eval_runtime": 21.7985,
1371
+ "eval_samples_per_second": 91.749,
1372
+ "eval_steps_per_second": 5.734,
1373
+ "step": 9100
1374
+ },
1375
+ {
1376
+ "epoch": 6.46,
1377
+ "learning_rate": 1.7781690140845072e-06,
1378
+ "loss": 0.1605,
1379
+ "step": 9200
1380
+ },
1381
+ {
1382
+ "epoch": 6.46,
1383
+ "eval_accuracy": 0.7354999780654907,
1384
+ "eval_loss": 0.99726402759552,
1385
+ "eval_runtime": 21.688,
1386
+ "eval_samples_per_second": 92.217,
1387
+ "eval_steps_per_second": 5.764,
1388
+ "step": 9200
1389
+ },
1390
+ {
1391
+ "epoch": 6.53,
1392
+ "learning_rate": 1.7429577464788734e-06,
1393
+ "loss": 0.1552,
1394
+ "step": 9300
1395
+ },
1396
+ {
1397
+ "epoch": 6.53,
1398
+ "eval_accuracy": 0.7289999723434448,
1399
+ "eval_loss": 1.0491423606872559,
1400
+ "eval_runtime": 20.7168,
1401
+ "eval_samples_per_second": 96.54,
1402
+ "eval_steps_per_second": 6.034,
1403
+ "step": 9300
1404
+ },
1405
+ {
1406
+ "epoch": 6.6,
1407
+ "learning_rate": 1.7077464788732395e-06,
1408
+ "loss": 0.1731,
1409
+ "step": 9400
1410
+ },
1411
+ {
1412
+ "epoch": 6.6,
1413
+ "eval_accuracy": 0.7335000038146973,
1414
+ "eval_loss": 1.027091145515442,
1415
+ "eval_runtime": 21.2758,
1416
+ "eval_samples_per_second": 94.003,
1417
+ "eval_steps_per_second": 5.875,
1418
+ "step": 9400
1419
+ },
1420
+ {
1421
+ "epoch": 6.67,
1422
+ "learning_rate": 1.6725352112676057e-06,
1423
+ "loss": 0.1738,
1424
+ "step": 9500
1425
+ },
1426
+ {
1427
+ "epoch": 6.67,
1428
+ "eval_accuracy": 0.7429999709129333,
1429
+ "eval_loss": 0.9575192928314209,
1430
+ "eval_runtime": 18.6703,
1431
+ "eval_samples_per_second": 107.122,
1432
+ "eval_steps_per_second": 6.695,
1433
+ "step": 9500
1434
+ },
1435
+ {
1436
+ "epoch": 6.74,
1437
+ "learning_rate": 1.637323943661972e-06,
1438
+ "loss": 0.1669,
1439
+ "step": 9600
1440
+ },
1441
+ {
1442
+ "epoch": 6.74,
1443
+ "eval_accuracy": 0.7350000143051147,
1444
+ "eval_loss": 0.9613668322563171,
1445
+ "eval_runtime": 21.3968,
1446
+ "eval_samples_per_second": 93.472,
1447
+ "eval_steps_per_second": 5.842,
1448
+ "step": 9600
1449
+ },
1450
+ {
1451
+ "epoch": 6.81,
1452
+ "learning_rate": 1.6021126760563382e-06,
1453
+ "loss": 0.1347,
1454
+ "step": 9700
1455
+ },
1456
+ {
1457
+ "epoch": 6.81,
1458
+ "eval_accuracy": 0.7365000247955322,
1459
+ "eval_loss": 1.0263434648513794,
1460
+ "eval_runtime": 19.7329,
1461
+ "eval_samples_per_second": 101.353,
1462
+ "eval_steps_per_second": 6.335,
1463
+ "step": 9700
1464
+ },
1465
+ {
1466
+ "epoch": 6.88,
1467
+ "learning_rate": 1.5669014084507045e-06,
1468
+ "loss": 0.1593,
1469
+ "step": 9800
1470
+ },
1471
+ {
1472
+ "epoch": 6.88,
1473
+ "eval_accuracy": 0.7360000014305115,
1474
+ "eval_loss": 1.017268180847168,
1475
+ "eval_runtime": 16.8592,
1476
+ "eval_samples_per_second": 118.63,
1477
+ "eval_steps_per_second": 7.414,
1478
+ "step": 9800
1479
+ },
1480
+ {
1481
+ "epoch": 6.95,
1482
+ "learning_rate": 1.5316901408450705e-06,
1483
+ "loss": 0.1549,
1484
+ "step": 9900
1485
+ },
1486
+ {
1487
+ "epoch": 6.95,
1488
+ "eval_accuracy": 0.7350000143051147,
1489
+ "eval_loss": 1.0397531986236572,
1490
+ "eval_runtime": 18.6472,
1491
+ "eval_samples_per_second": 107.255,
1492
+ "eval_steps_per_second": 6.703,
1493
+ "step": 9900
1494
+ },
1495
+ {
1496
+ "epoch": 7.02,
1497
+ "learning_rate": 1.4964788732394366e-06,
1498
+ "loss": 0.1675,
1499
+ "step": 10000
1500
+ },
1501
+ {
1502
+ "epoch": 7.02,
1503
+ "eval_accuracy": 0.7379999756813049,
1504
+ "eval_loss": 0.9975456595420837,
1505
+ "eval_runtime": 19.9355,
1506
+ "eval_samples_per_second": 100.323,
1507
+ "eval_steps_per_second": 6.27,
1508
+ "step": 10000
1509
+ },
1510
+ {
1511
+ "epoch": 7.09,
1512
+ "learning_rate": 1.461267605633803e-06,
1513
+ "loss": 0.1182,
1514
+ "step": 10100
1515
+ },
1516
+ {
1517
+ "epoch": 7.09,
1518
+ "eval_accuracy": 0.7350000143051147,
1519
+ "eval_loss": 1.105920672416687,
1520
+ "eval_runtime": 22.1229,
1521
+ "eval_samples_per_second": 90.404,
1522
+ "eval_steps_per_second": 5.65,
1523
+ "step": 10100
1524
+ },
1525
+ {
1526
+ "epoch": 7.16,
1527
+ "learning_rate": 1.4260563380281691e-06,
1528
+ "loss": 0.1351,
1529
+ "step": 10200
1530
+ },
1531
+ {
1532
+ "epoch": 7.16,
1533
+ "eval_accuracy": 0.7400000095367432,
1534
+ "eval_loss": 1.093347191810608,
1535
+ "eval_runtime": 20.9448,
1536
+ "eval_samples_per_second": 95.489,
1537
+ "eval_steps_per_second": 5.968,
1538
+ "step": 10200
1539
+ },
1540
+ {
1541
+ "epoch": 7.23,
1542
+ "learning_rate": 1.3908450704225355e-06,
1543
+ "loss": 0.1496,
1544
+ "step": 10300
1545
+ },
1546
+ {
1547
+ "epoch": 7.23,
1548
+ "eval_accuracy": 0.7354999780654907,
1549
+ "eval_loss": 1.0731019973754883,
1550
+ "eval_runtime": 17.3295,
1551
+ "eval_samples_per_second": 115.41,
1552
+ "eval_steps_per_second": 7.213,
1553
+ "step": 10300
1554
+ },
1555
+ {
1556
+ "epoch": 7.3,
1557
+ "learning_rate": 1.3556338028169017e-06,
1558
+ "loss": 0.1197,
1559
+ "step": 10400
1560
+ },
1561
+ {
1562
+ "epoch": 7.3,
1563
+ "eval_accuracy": 0.7360000014305115,
1564
+ "eval_loss": 1.1089140176773071,
1565
+ "eval_runtime": 19.6723,
1566
+ "eval_samples_per_second": 101.666,
1567
+ "eval_steps_per_second": 6.354,
1568
+ "step": 10400
1569
+ },
1570
+ {
1571
+ "epoch": 7.37,
1572
+ "learning_rate": 1.3204225352112676e-06,
1573
+ "loss": 0.1111,
1574
+ "step": 10500
1575
+ },
1576
+ {
1577
+ "epoch": 7.37,
1578
+ "eval_accuracy": 0.7404999732971191,
1579
+ "eval_loss": 1.1381380558013916,
1580
+ "eval_runtime": 18.6584,
1581
+ "eval_samples_per_second": 107.191,
1582
+ "eval_steps_per_second": 6.699,
1583
+ "step": 10500
1584
+ },
1585
+ {
1586
+ "epoch": 7.44,
1587
+ "learning_rate": 1.285211267605634e-06,
1588
+ "loss": 0.1494,
1589
+ "step": 10600
1590
+ },
1591
+ {
1592
+ "epoch": 7.44,
1593
+ "eval_accuracy": 0.7425000071525574,
1594
+ "eval_loss": 1.0251615047454834,
1595
+ "eval_runtime": 20.1427,
1596
+ "eval_samples_per_second": 99.292,
1597
+ "eval_steps_per_second": 6.206,
1598
+ "step": 10600
1599
+ },
1600
+ {
1601
+ "epoch": 7.51,
1602
+ "learning_rate": 1.25e-06,
1603
+ "loss": 0.1235,
1604
+ "step": 10700
1605
+ },
1606
+ {
1607
+ "epoch": 7.51,
1608
+ "eval_accuracy": 0.7360000014305115,
1609
+ "eval_loss": 1.0906413793563843,
1610
+ "eval_runtime": 22.5849,
1611
+ "eval_samples_per_second": 88.555,
1612
+ "eval_steps_per_second": 5.535,
1613
+ "step": 10700
1614
+ },
1615
+ {
1616
+ "epoch": 7.58,
1617
+ "learning_rate": 1.2147887323943663e-06,
1618
+ "loss": 0.133,
1619
+ "step": 10800
1620
+ },
1621
+ {
1622
+ "epoch": 7.58,
1623
+ "eval_accuracy": 0.737500011920929,
1624
+ "eval_loss": 1.1796296834945679,
1625
+ "eval_runtime": 12.3686,
1626
+ "eval_samples_per_second": 161.699,
1627
+ "eval_steps_per_second": 10.106,
1628
+ "step": 10800
1629
+ },
1630
+ {
1631
+ "epoch": 7.65,
1632
+ "learning_rate": 1.1795774647887324e-06,
1633
+ "loss": 0.1248,
1634
+ "step": 10900
1635
+ },
1636
+ {
1637
+ "epoch": 7.65,
1638
+ "eval_accuracy": 0.7419999837875366,
1639
+ "eval_loss": 1.1331868171691895,
1640
+ "eval_runtime": 21.2537,
1641
+ "eval_samples_per_second": 94.101,
1642
+ "eval_steps_per_second": 5.881,
1643
+ "step": 10900
1644
+ },
1645
+ {
1646
+ "epoch": 7.72,
1647
+ "learning_rate": 1.1443661971830988e-06,
1648
+ "loss": 0.1268,
1649
+ "step": 11000
1650
+ },
1651
+ {
1652
+ "epoch": 7.72,
1653
+ "eval_accuracy": 0.7415000200271606,
1654
+ "eval_loss": 1.1304017305374146,
1655
+ "eval_runtime": 20.2304,
1656
+ "eval_samples_per_second": 98.861,
1657
+ "eval_steps_per_second": 6.179,
1658
+ "step": 11000
1659
+ },
1660
+ {
1661
+ "epoch": 7.79,
1662
+ "learning_rate": 1.109154929577465e-06,
1663
+ "loss": 0.1368,
1664
+ "step": 11100
1665
+ },
1666
+ {
1667
+ "epoch": 7.79,
1668
+ "eval_accuracy": 0.7379999756813049,
1669
+ "eval_loss": 1.1345131397247314,
1670
+ "eval_runtime": 21.0499,
1671
+ "eval_samples_per_second": 95.013,
1672
+ "eval_steps_per_second": 5.938,
1673
+ "step": 11100
1674
+ },
1675
+ {
1676
+ "epoch": 7.86,
1677
+ "learning_rate": 1.073943661971831e-06,
1678
+ "loss": 0.1228,
1679
+ "step": 11200
1680
+ },
1681
+ {
1682
+ "epoch": 7.86,
1683
+ "eval_accuracy": 0.7319999933242798,
1684
+ "eval_loss": 1.2018308639526367,
1685
+ "eval_runtime": 21.3555,
1686
+ "eval_samples_per_second": 93.653,
1687
+ "eval_steps_per_second": 5.853,
1688
+ "step": 11200
1689
+ },
1690
+ {
1691
+ "epoch": 7.93,
1692
+ "learning_rate": 1.0387323943661972e-06,
1693
+ "loss": 0.1281,
1694
+ "step": 11300
1695
+ },
1696
+ {
1697
+ "epoch": 7.93,
1698
+ "eval_accuracy": 0.7350000143051147,
1699
+ "eval_loss": 1.1884474754333496,
1700
+ "eval_runtime": 18.4277,
1701
+ "eval_samples_per_second": 108.532,
1702
+ "eval_steps_per_second": 6.783,
1703
+ "step": 11300
1704
+ },
1705
+ {
1706
+ "epoch": 8.0,
1707
+ "learning_rate": 1.0035211267605636e-06,
1708
+ "loss": 0.1449,
1709
+ "step": 11400
1710
+ },
1711
+ {
1712
+ "epoch": 8.0,
1713
+ "eval_accuracy": 0.734499990940094,
1714
+ "eval_loss": 1.157057762145996,
1715
+ "eval_runtime": 16.3477,
1716
+ "eval_samples_per_second": 122.341,
1717
+ "eval_steps_per_second": 7.646,
1718
+ "step": 11400
1719
+ },
1720
+ {
1721
+ "epoch": 8.07,
1722
+ "learning_rate": 9.683098591549295e-07,
1723
+ "loss": 0.1025,
1724
+ "step": 11500
1725
+ },
1726
+ {
1727
+ "epoch": 8.07,
1728
+ "eval_accuracy": 0.734499990940094,
1729
+ "eval_loss": 1.153812289237976,
1730
+ "eval_runtime": 15.7861,
1731
+ "eval_samples_per_second": 126.694,
1732
+ "eval_steps_per_second": 7.918,
1733
+ "step": 11500
1734
+ },
1735
+ {
1736
+ "epoch": 8.14,
1737
+ "learning_rate": 9.330985915492959e-07,
1738
+ "loss": 0.1199,
1739
+ "step": 11600
1740
+ },
1741
+ {
1742
+ "epoch": 8.14,
1743
+ "eval_accuracy": 0.7390000224113464,
1744
+ "eval_loss": 1.2113364934921265,
1745
+ "eval_runtime": 16.1478,
1746
+ "eval_samples_per_second": 123.856,
1747
+ "eval_steps_per_second": 7.741,
1748
+ "step": 11600
1749
+ },
1750
+ {
1751
+ "epoch": 8.21,
1752
+ "learning_rate": 8.978873239436621e-07,
1753
+ "loss": 0.1016,
1754
+ "step": 11700
1755
+ },
1756
+ {
1757
+ "epoch": 8.21,
1758
+ "eval_accuracy": 0.7369999885559082,
1759
+ "eval_loss": 1.2881745100021362,
1760
+ "eval_runtime": 23.0414,
1761
+ "eval_samples_per_second": 86.8,
1762
+ "eval_steps_per_second": 5.425,
1763
+ "step": 11700
1764
+ },
1765
+ {
1766
+ "epoch": 8.28,
1767
+ "learning_rate": 8.626760563380282e-07,
1768
+ "loss": 0.114,
1769
+ "step": 11800
1770
+ },
1771
+ {
1772
+ "epoch": 8.28,
1773
+ "eval_accuracy": 0.7390000224113464,
1774
+ "eval_loss": 1.287195086479187,
1775
+ "eval_runtime": 19.2083,
1776
+ "eval_samples_per_second": 104.122,
1777
+ "eval_steps_per_second": 6.508,
1778
+ "step": 11800
1779
+ },
1780
+ {
1781
+ "epoch": 8.35,
1782
+ "learning_rate": 8.274647887323944e-07,
1783
+ "loss": 0.1019,
1784
+ "step": 11900
1785
+ },
1786
+ {
1787
+ "epoch": 8.35,
1788
+ "eval_accuracy": 0.7379999756813049,
1789
+ "eval_loss": 1.287625789642334,
1790
+ "eval_runtime": 19.2034,
1791
+ "eval_samples_per_second": 104.148,
1792
+ "eval_steps_per_second": 6.509,
1793
+ "step": 11900
1794
+ },
1795
+ {
1796
+ "epoch": 8.42,
1797
+ "learning_rate": 7.922535211267607e-07,
1798
+ "loss": 0.1142,
1799
+ "step": 12000
1800
+ },
1801
+ {
1802
+ "epoch": 8.42,
1803
+ "eval_accuracy": 0.7384999990463257,
1804
+ "eval_loss": 1.2790753841400146,
1805
+ "eval_runtime": 12.6779,
1806
+ "eval_samples_per_second": 157.755,
1807
+ "eval_steps_per_second": 9.86,
1808
+ "step": 12000
1809
+ },
1810
+ {
1811
+ "epoch": 8.49,
1812
+ "learning_rate": 7.570422535211268e-07,
1813
+ "loss": 0.1135,
1814
+ "step": 12100
1815
+ },
1816
+ {
1817
+ "epoch": 8.49,
1818
+ "eval_accuracy": 0.7379999756813049,
1819
+ "eval_loss": 1.2882863283157349,
1820
+ "eval_runtime": 22.9048,
1821
+ "eval_samples_per_second": 87.318,
1822
+ "eval_steps_per_second": 5.457,
1823
+ "step": 12100
1824
+ },
1825
+ {
1826
+ "epoch": 8.56,
1827
+ "learning_rate": 7.21830985915493e-07,
1828
+ "loss": 0.1139,
1829
+ "step": 12200
1830
+ },
1831
+ {
1832
+ "epoch": 8.56,
1833
+ "eval_accuracy": 0.7360000014305115,
1834
+ "eval_loss": 1.2828530073165894,
1835
+ "eval_runtime": 12.6066,
1836
+ "eval_samples_per_second": 158.647,
1837
+ "eval_steps_per_second": 9.915,
1838
+ "step": 12200
1839
+ },
1840
+ {
1841
+ "epoch": 8.63,
1842
+ "learning_rate": 6.866197183098592e-07,
1843
+ "loss": 0.1107,
1844
+ "step": 12300
1845
+ },
1846
+ {
1847
+ "epoch": 8.63,
1848
+ "eval_accuracy": 0.7365000247955322,
1849
+ "eval_loss": 1.269805669784546,
1850
+ "eval_runtime": 18.0397,
1851
+ "eval_samples_per_second": 110.866,
1852
+ "eval_steps_per_second": 6.929,
1853
+ "step": 12300
1854
+ },
1855
+ {
1856
+ "epoch": 8.7,
1857
+ "learning_rate": 6.514084507042254e-07,
1858
+ "loss": 0.1183,
1859
+ "step": 12400
1860
+ },
1861
+ {
1862
+ "epoch": 8.7,
1863
+ "eval_accuracy": 0.734499990940094,
1864
+ "eval_loss": 1.266024112701416,
1865
+ "eval_runtime": 18.8559,
1866
+ "eval_samples_per_second": 106.068,
1867
+ "eval_steps_per_second": 6.629,
1868
+ "step": 12400
1869
+ },
1870
+ {
1871
+ "epoch": 8.77,
1872
+ "learning_rate": 6.161971830985916e-07,
1873
+ "loss": 0.1064,
1874
+ "step": 12500
1875
+ },
1876
+ {
1877
+ "epoch": 8.77,
1878
+ "eval_accuracy": 0.7365000247955322,
1879
+ "eval_loss": 1.288902759552002,
1880
+ "eval_runtime": 20.1463,
1881
+ "eval_samples_per_second": 99.274,
1882
+ "eval_steps_per_second": 6.205,
1883
+ "step": 12500
1884
+ },
1885
+ {
1886
+ "epoch": 8.84,
1887
+ "learning_rate": 5.809859154929578e-07,
1888
+ "loss": 0.0895,
1889
+ "step": 12600
1890
+ },
1891
+ {
1892
+ "epoch": 8.84,
1893
+ "eval_accuracy": 0.7329999804496765,
1894
+ "eval_loss": 1.3480335474014282,
1895
+ "eval_runtime": 19.9481,
1896
+ "eval_samples_per_second": 100.26,
1897
+ "eval_steps_per_second": 6.266,
1898
+ "step": 12600
1899
+ },
1900
+ {
1901
+ "epoch": 8.91,
1902
+ "learning_rate": 5.457746478873239e-07,
1903
+ "loss": 0.1244,
1904
+ "step": 12700
1905
+ },
1906
+ {
1907
+ "epoch": 8.91,
1908
+ "eval_accuracy": 0.7325000166893005,
1909
+ "eval_loss": 1.2872198820114136,
1910
+ "eval_runtime": 17.773,
1911
+ "eval_samples_per_second": 112.53,
1912
+ "eval_steps_per_second": 7.033,
1913
+ "step": 12700
1914
+ },
1915
+ {
1916
+ "epoch": 8.98,
1917
+ "learning_rate": 5.105633802816902e-07,
1918
+ "loss": 0.1209,
1919
+ "step": 12800
1920
+ },
1921
+ {
1922
+ "epoch": 8.98,
1923
+ "eval_accuracy": 0.737500011920929,
1924
+ "eval_loss": 1.2680846452713013,
1925
+ "eval_runtime": 22.2614,
1926
+ "eval_samples_per_second": 89.842,
1927
+ "eval_steps_per_second": 5.615,
1928
+ "step": 12800
1929
+ },
1930
+ {
1931
+ "epoch": 9.05,
1932
+ "learning_rate": 4.7535211267605635e-07,
1933
+ "loss": 0.1144,
1934
+ "step": 12900
1935
+ },
1936
+ {
1937
+ "epoch": 9.05,
1938
+ "eval_accuracy": 0.7369999885559082,
1939
+ "eval_loss": 1.2711447477340698,
1940
+ "eval_runtime": 18.1651,
1941
+ "eval_samples_per_second": 110.101,
1942
+ "eval_steps_per_second": 6.881,
1943
+ "step": 12900
1944
+ },
1945
+ {
1946
+ "epoch": 9.12,
1947
+ "learning_rate": 4.4014084507042255e-07,
1948
+ "loss": 0.1034,
1949
+ "step": 13000
1950
+ },
1951
+ {
1952
+ "epoch": 9.12,
1953
+ "eval_accuracy": 0.7360000014305115,
1954
+ "eval_loss": 1.2800805568695068,
1955
+ "eval_runtime": 20.275,
1956
+ "eval_samples_per_second": 98.644,
1957
+ "eval_steps_per_second": 6.165,
1958
+ "step": 13000
1959
+ },
1960
+ {
1961
+ "epoch": 9.19,
1962
+ "learning_rate": 4.049295774647888e-07,
1963
+ "loss": 0.113,
1964
+ "step": 13100
1965
+ },
1966
+ {
1967
+ "epoch": 9.19,
1968
+ "eval_accuracy": 0.7350000143051147,
1969
+ "eval_loss": 1.2801427841186523,
1970
+ "eval_runtime": 19.6919,
1971
+ "eval_samples_per_second": 101.565,
1972
+ "eval_steps_per_second": 6.348,
1973
+ "step": 13100
1974
+ },
1975
+ {
1976
+ "epoch": 9.26,
1977
+ "learning_rate": 3.6971830985915495e-07,
1978
+ "loss": 0.0994,
1979
+ "step": 13200
1980
+ },
1981
+ {
1982
+ "epoch": 9.26,
1983
+ "eval_accuracy": 0.7360000014305115,
1984
+ "eval_loss": 1.2920359373092651,
1985
+ "eval_runtime": 18.5273,
1986
+ "eval_samples_per_second": 107.949,
1987
+ "eval_steps_per_second": 6.747,
1988
+ "step": 13200
1989
+ },
1990
+ {
1991
+ "epoch": 9.33,
1992
+ "learning_rate": 3.345070422535211e-07,
1993
+ "loss": 0.0966,
1994
+ "step": 13300
1995
+ },
1996
+ {
1997
+ "epoch": 9.33,
1998
+ "eval_accuracy": 0.7335000038146973,
1999
+ "eval_loss": 1.2760682106018066,
2000
+ "eval_runtime": 21.2516,
2001
+ "eval_samples_per_second": 94.11,
2002
+ "eval_steps_per_second": 5.882,
2003
+ "step": 13300
2004
+ },
2005
+ {
2006
+ "epoch": 9.4,
2007
+ "learning_rate": 2.992957746478873e-07,
2008
+ "loss": 0.0939,
2009
+ "step": 13400
2010
+ },
2011
+ {
2012
+ "epoch": 9.4,
2013
+ "eval_accuracy": 0.7365000247955322,
2014
+ "eval_loss": 1.2908642292022705,
2015
+ "eval_runtime": 17.0257,
2016
+ "eval_samples_per_second": 117.47,
2017
+ "eval_steps_per_second": 7.342,
2018
+ "step": 13400
2019
+ },
2020
+ {
2021
+ "epoch": 9.47,
2022
+ "learning_rate": 2.6408450704225356e-07,
2023
+ "loss": 0.0975,
2024
+ "step": 13500
2025
+ },
2026
+ {
2027
+ "epoch": 9.47,
2028
+ "eval_accuracy": 0.7360000014305115,
2029
+ "eval_loss": 1.2952693700790405,
2030
+ "eval_runtime": 20.8537,
2031
+ "eval_samples_per_second": 95.906,
2032
+ "eval_steps_per_second": 5.994,
2033
+ "step": 13500
2034
+ },
2035
+ {
2036
+ "epoch": 9.54,
2037
+ "learning_rate": 2.2887323943661974e-07,
2038
+ "loss": 0.0842,
2039
+ "step": 13600
2040
+ },
2041
+ {
2042
+ "epoch": 9.54,
2043
+ "eval_accuracy": 0.7335000038146973,
2044
+ "eval_loss": 1.3179150819778442,
2045
+ "eval_runtime": 20.9509,
2046
+ "eval_samples_per_second": 95.461,
2047
+ "eval_steps_per_second": 5.966,
2048
+ "step": 13600
2049
+ },
2050
+ {
2051
+ "epoch": 9.61,
2052
+ "learning_rate": 1.936619718309859e-07,
2053
+ "loss": 0.0871,
2054
+ "step": 13700
2055
+ },
2056
+ {
2057
+ "epoch": 9.61,
2058
+ "eval_accuracy": 0.7384999990463257,
2059
+ "eval_loss": 1.314935564994812,
2060
+ "eval_runtime": 20.6989,
2061
+ "eval_samples_per_second": 96.623,
2062
+ "eval_steps_per_second": 6.039,
2063
+ "step": 13700
2064
+ },
2065
+ {
2066
+ "epoch": 9.68,
2067
+ "learning_rate": 1.5845070422535212e-07,
2068
+ "loss": 0.1162,
2069
+ "step": 13800
2070
+ },
2071
+ {
2072
+ "epoch": 9.68,
2073
+ "eval_accuracy": 0.7350000143051147,
2074
+ "eval_loss": 1.3124284744262695,
2075
+ "eval_runtime": 21.7736,
2076
+ "eval_samples_per_second": 91.854,
2077
+ "eval_steps_per_second": 5.741,
2078
+ "step": 13800
2079
+ },
2080
+ {
2081
+ "epoch": 9.75,
2082
+ "learning_rate": 1.2323943661971832e-07,
2083
+ "loss": 0.085,
2084
+ "step": 13900
2085
+ },
2086
+ {
2087
+ "epoch": 9.75,
2088
+ "eval_accuracy": 0.7354999780654907,
2089
+ "eval_loss": 1.3206626176834106,
2090
+ "eval_runtime": 18.1587,
2091
+ "eval_samples_per_second": 110.14,
2092
+ "eval_steps_per_second": 6.884,
2093
+ "step": 13900
2094
+ },
2095
+ {
2096
+ "epoch": 9.82,
2097
+ "learning_rate": 8.802816901408452e-08,
2098
+ "loss": 0.0966,
2099
+ "step": 14000
2100
+ },
2101
+ {
2102
+ "epoch": 9.82,
2103
+ "eval_accuracy": 0.7335000038146973,
2104
+ "eval_loss": 1.3247543573379517,
2105
+ "eval_runtime": 16.4512,
2106
+ "eval_samples_per_second": 121.572,
2107
+ "eval_steps_per_second": 7.598,
2108
+ "step": 14000
2109
+ },
2110
+ {
2111
+ "epoch": 9.89,
2112
+ "learning_rate": 5.281690140845071e-08,
2113
+ "loss": 0.1064,
2114
+ "step": 14100
2115
+ },
2116
+ {
2117
+ "epoch": 9.89,
2118
+ "eval_accuracy": 0.7335000038146973,
2119
+ "eval_loss": 1.3260987997055054,
2120
+ "eval_runtime": 22.2976,
2121
+ "eval_samples_per_second": 89.696,
2122
+ "eval_steps_per_second": 5.606,
2123
+ "step": 14100
2124
+ },
2125
+ {
2126
+ "epoch": 9.96,
2127
+ "learning_rate": 1.7605633802816902e-08,
2128
+ "loss": 0.1046,
2129
+ "step": 14200
2130
+ },
2131
+ {
2132
+ "epoch": 9.96,
2133
+ "eval_accuracy": 0.7360000014305115,
2134
+ "eval_loss": 1.3255339860916138,
2135
+ "eval_runtime": 19.2834,
2136
+ "eval_samples_per_second": 103.716,
2137
+ "eval_steps_per_second": 6.482,
2138
+ "step": 14200
2139
+ },
2140
+ {
2141
+ "epoch": 10.0,
2142
+ "step": 14250,
2143
+ "total_flos": 2.1254540922139392e+17,
2144
+ "train_loss": 0.2872312853629129,
2145
+ "train_runtime": 13159.309,
2146
+ "train_samples_per_second": 34.664,
2147
+ "train_steps_per_second": 1.083
2148
+ }
2149
+ ],
2150
+ "max_steps": 14250,
2151
+ "num_train_epochs": 10,
2152
+ "total_flos": 2.1254540922139392e+17,
2153
+ "trial_name": null,
2154
+ "trial_params": null
2155
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f96f13160d8ffc1bca4aa21a830360cbd97bda3ce08b241fde8b2efb824992a9
3
+ size 3375