DGurgurov commited on
Commit
589c840
1 Parent(s): 74b7dfb

Upload 17 files

Browse files
README.md CHANGED
@@ -1,3 +1,160 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: bert-base-multilingual-cased
4
+ tags:
5
+ - generated_from_trainer
6
+ metrics:
7
+ - accuracy
8
+ model-index:
9
+ - name: bg
10
+ results: []
11
  ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # bg
17
+
18
+ This model is an adapter fine-tuned on top of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) on the Bulgarian ConceptNet dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.4640
21
+ - Accuracy: 0.8875
22
+
23
+ ## Model description
24
+
25
+ More information needed
26
+
27
+ ## Intended uses & limitations
28
+
29
+ More information needed
30
+
31
+ ## Training and evaluation data
32
+
33
+ More information needed
34
+
35
+ ## Training procedure
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 5e-05
41
+ - train_batch_size: 16
42
+ - eval_batch_size: 16
43
+ - seed: 42
44
+ - distributed_type: multi-GPU
45
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
+ - lr_scheduler_type: linear
47
+ - training_steps: 50000
48
+
49
+ ### Training results
50
+
51
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
52
+ |:-------------:|:-----:|:-----:|:---------------:|:--------:|
53
+ | 1.5057 | 0.15 | 500 | 0.9846 | 0.8149 |
54
+ | 1.0172 | 0.31 | 1000 | 0.8395 | 0.8259 |
55
+ | 0.8814 | 0.46 | 1500 | 0.7823 | 0.8368 |
56
+ | 0.8405 | 0.61 | 2000 | 0.7437 | 0.8449 |
57
+ | 0.7773 | 0.77 | 2500 | 0.7247 | 0.8387 |
58
+ | 0.7762 | 0.92 | 3000 | 0.6521 | 0.8513 |
59
+ | 0.7186 | 1.07 | 3500 | 0.6834 | 0.8492 |
60
+ | 0.7033 | 1.22 | 4000 | 0.6715 | 0.8523 |
61
+ | 0.672 | 1.38 | 4500 | 0.6539 | 0.8560 |
62
+ | 0.6613 | 1.53 | 5000 | 0.6387 | 0.8567 |
63
+ | 0.6712 | 1.68 | 5500 | 0.6180 | 0.8624 |
64
+ | 0.6776 | 1.84 | 6000 | 0.6635 | 0.8537 |
65
+ | 0.6484 | 1.99 | 6500 | 0.5946 | 0.8661 |
66
+ | 0.6817 | 2.14 | 7000 | 0.6126 | 0.8655 |
67
+ | 0.6392 | 2.3 | 7500 | 0.6136 | 0.8613 |
68
+ | 0.6394 | 2.45 | 8000 | 0.6321 | 0.8621 |
69
+ | 0.6273 | 2.6 | 8500 | 0.5997 | 0.8629 |
70
+ | 0.5993 | 2.76 | 9000 | 0.6028 | 0.8646 |
71
+ | 0.6527 | 2.91 | 9500 | 0.6584 | 0.8510 |
72
+ | 0.5897 | 3.06 | 10000 | 0.5728 | 0.8676 |
73
+ | 0.574 | 3.21 | 10500 | 0.5870 | 0.8671 |
74
+ | 0.6026 | 3.37 | 11000 | 0.6067 | 0.8677 |
75
+ | 0.5896 | 3.52 | 11500 | 0.6000 | 0.8638 |
76
+ | 0.566 | 3.67 | 12000 | 0.5566 | 0.8712 |
77
+ | 0.5928 | 3.83 | 12500 | 0.5621 | 0.8675 |
78
+ | 0.597 | 3.98 | 13000 | 0.5162 | 0.8771 |
79
+ | 0.5836 | 4.13 | 13500 | 0.5498 | 0.8696 |
80
+ | 0.5864 | 4.29 | 14000 | 0.5728 | 0.8640 |
81
+ | 0.5562 | 4.44 | 14500 | 0.6000 | 0.8623 |
82
+ | 0.5999 | 4.59 | 15000 | 0.5589 | 0.8679 |
83
+ | 0.5767 | 4.75 | 15500 | 0.5713 | 0.8681 |
84
+ | 0.5574 | 4.9 | 16000 | 0.5338 | 0.8739 |
85
+ | 0.568 | 5.05 | 16500 | 0.5527 | 0.8725 |
86
+ | 0.5568 | 5.21 | 17000 | 0.5058 | 0.8777 |
87
+ | 0.5369 | 5.36 | 17500 | 0.5599 | 0.8720 |
88
+ | 0.518 | 5.51 | 18000 | 0.5610 | 0.8720 |
89
+ | 0.5637 | 5.66 | 18500 | 0.5467 | 0.8728 |
90
+ | 0.557 | 5.82 | 19000 | 0.5349 | 0.8714 |
91
+ | 0.5499 | 5.97 | 19500 | 0.5468 | 0.8724 |
92
+ | 0.5304 | 6.12 | 20000 | 0.5243 | 0.8741 |
93
+ | 0.5431 | 6.28 | 20500 | 0.4998 | 0.8784 |
94
+ | 0.5508 | 6.43 | 21000 | 0.5367 | 0.8764 |
95
+ | 0.5701 | 6.58 | 21500 | 0.5365 | 0.8734 |
96
+ | 0.521 | 6.74 | 22000 | 0.4879 | 0.8819 |
97
+ | 0.5514 | 6.89 | 22500 | 0.5106 | 0.8787 |
98
+ | 0.547 | 7.04 | 23000 | 0.5258 | 0.8747 |
99
+ | 0.5512 | 7.2 | 23500 | 0.4975 | 0.8778 |
100
+ | 0.5407 | 7.35 | 24000 | 0.4944 | 0.8786 |
101
+ | 0.5181 | 7.5 | 24500 | 0.4912 | 0.8795 |
102
+ | 0.5493 | 7.65 | 25000 | 0.5188 | 0.8730 |
103
+ | 0.5388 | 7.81 | 25500 | 0.5000 | 0.8831 |
104
+ | 0.5284 | 7.96 | 26000 | 0.5161 | 0.8737 |
105
+ | 0.5116 | 8.11 | 26500 | 0.5263 | 0.8760 |
106
+ | 0.5161 | 8.27 | 27000 | 0.5002 | 0.8787 |
107
+ | 0.5185 | 8.42 | 27500 | 0.5127 | 0.8745 |
108
+ | 0.5291 | 8.57 | 28000 | 0.5116 | 0.8782 |
109
+ | 0.5061 | 8.73 | 28500 | 0.4972 | 0.8774 |
110
+ | 0.479 | 8.88 | 29000 | 0.4978 | 0.8798 |
111
+ | 0.5154 | 9.03 | 29500 | 0.5088 | 0.8771 |
112
+ | 0.4989 | 9.19 | 30000 | 0.5119 | 0.8744 |
113
+ | 0.5098 | 9.34 | 30500 | 0.4916 | 0.8826 |
114
+ | 0.4777 | 9.49 | 31000 | 0.4957 | 0.8824 |
115
+ | 0.5462 | 9.64 | 31500 | 0.4846 | 0.8779 |
116
+ | 0.509 | 9.8 | 32000 | 0.4873 | 0.8810 |
117
+ | 0.5181 | 9.95 | 32500 | 0.5227 | 0.8710 |
118
+ | 0.5269 | 10.1 | 33000 | 0.4929 | 0.8803 |
119
+ | 0.5094 | 10.26 | 33500 | 0.4841 | 0.8877 |
120
+ | 0.5033 | 10.41 | 34000 | 0.5129 | 0.8805 |
121
+ | 0.4913 | 10.56 | 34500 | 0.4978 | 0.8789 |
122
+ | 0.4938 | 10.72 | 35000 | 0.4640 | 0.8838 |
123
+ | 0.4954 | 10.87 | 35500 | 0.4991 | 0.8794 |
124
+ | 0.458 | 11.02 | 36000 | 0.4453 | 0.8886 |
125
+ | 0.526 | 11.18 | 36500 | 0.4863 | 0.8832 |
126
+ | 0.4809 | 11.33 | 37000 | 0.4923 | 0.8784 |
127
+ | 0.466 | 11.48 | 37500 | 0.4824 | 0.8807 |
128
+ | 0.4903 | 11.64 | 38000 | 0.4552 | 0.8848 |
129
+ | 0.4875 | 11.79 | 38500 | 0.4850 | 0.8780 |
130
+ | 0.4858 | 11.94 | 39000 | 0.4728 | 0.8833 |
131
+ | 0.4868 | 12.09 | 39500 | 0.4868 | 0.8800 |
132
+ | 0.485 | 12.25 | 40000 | 0.4935 | 0.8802 |
133
+ | 0.4823 | 12.4 | 40500 | 0.4789 | 0.8828 |
134
+ | 0.4629 | 12.55 | 41000 | 0.4834 | 0.8835 |
135
+ | 0.4915 | 12.71 | 41500 | 0.4864 | 0.8812 |
136
+ | 0.473 | 12.86 | 42000 | 0.5136 | 0.8793 |
137
+ | 0.4849 | 13.01 | 42500 | 0.4823 | 0.8815 |
138
+ | 0.4582 | 13.17 | 43000 | 0.4637 | 0.8844 |
139
+ | 0.4938 | 13.32 | 43500 | 0.4829 | 0.8842 |
140
+ | 0.4682 | 13.47 | 44000 | 0.4799 | 0.8817 |
141
+ | 0.4885 | 13.63 | 44500 | 0.4754 | 0.8858 |
142
+ | 0.4641 | 13.78 | 45000 | 0.4738 | 0.8849 |
143
+ | 0.4664 | 13.93 | 45500 | 0.4512 | 0.8869 |
144
+ | 0.4722 | 14.08 | 46000 | 0.4821 | 0.8836 |
145
+ | 0.485 | 14.24 | 46500 | 0.4735 | 0.8842 |
146
+ | 0.4784 | 14.39 | 47000 | 0.4557 | 0.8823 |
147
+ | 0.4821 | 14.54 | 47500 | 0.4707 | 0.8856 |
148
+ | 0.478 | 14.7 | 48000 | 0.4682 | 0.8846 |
149
+ | 0.451 | 14.85 | 48500 | 0.4744 | 0.8781 |
150
+ | 0.4582 | 15.0 | 49000 | 0.4617 | 0.8835 |
151
+ | 0.4949 | 15.16 | 49500 | 0.4769 | 0.8835 |
152
+ | 0.4546 | 15.31 | 50000 | 0.4677 | 0.8835 |
153
+
154
+
155
+ ### Framework versions
156
+
157
+ - Transformers 4.35.2
158
+ - Pytorch 2.0.0
159
+ - Datasets 2.15.0
160
+ - Tokenizers 0.15.0
logs/bg_cn_lang_adapter.png ADDED
logs/events.out.tfevents.1709072592.serv-3318.1735865.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bc37bc50f2d5e69e98e5753b90f687e311a30a43be901b779b4ba5d484b4e33
3
+ size 53301
logs/events.out.tfevents.1709075335.serv-3318.1735865.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bed101833d8c96a8c8f862a883435f5519523000c2d6bcac36446bcbe9872d0
3
+ size 369
mlm/adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": {
3
+ "adapter_residual_before_ln": false,
4
+ "cross_adapter": false,
5
+ "factorized_phm_W": true,
6
+ "factorized_phm_rule": false,
7
+ "hypercomplex_nonlinearity": "glorot-uniform",
8
+ "init_weights": "bert",
9
+ "inv_adapter": null,
10
+ "inv_adapter_reduction_factor": null,
11
+ "is_parallel": false,
12
+ "learn_phm": true,
13
+ "leave_out": [],
14
+ "ln_after": false,
15
+ "ln_before": false,
16
+ "mh_adapter": false,
17
+ "non_linearity": "relu",
18
+ "original_ln_after": true,
19
+ "original_ln_before": true,
20
+ "output_adapter": true,
21
+ "phm_bias": true,
22
+ "phm_c_init": "normal",
23
+ "phm_dim": 4,
24
+ "phm_init_range": 0.0001,
25
+ "phm_layer": false,
26
+ "phm_rank": 1,
27
+ "reduction_factor": 16,
28
+ "residual_before_ln": true,
29
+ "scaling": 1.0,
30
+ "shared_W_phm": false,
31
+ "shared_phm_rule": true,
32
+ "use_gating": false
33
+ },
34
+ "config_id": "9076f36a74755ac4",
35
+ "hidden_size": 768,
36
+ "model_class": "BertForMaskedLM",
37
+ "model_name": "bert-base-multilingual-cased",
38
+ "model_type": "bert",
39
+ "name": "mlm",
40
+ "version": "0.1.1"
41
+ }
mlm/head_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "config": null,
3
+ "hidden_size": 768,
4
+ "label2id": {
5
+ "LABEL_0": 0,
6
+ "LABEL_1": 1
7
+ },
8
+ "model_class": "BertForMaskedLM",
9
+ "model_name": "bert-base-multilingual-cased",
10
+ "model_type": "bert",
11
+ "name": null,
12
+ "num_labels": 2,
13
+ "version": "0.1.1"
14
+ }
mlm/pytorch_adapter.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b223a57f41a9e6a2c8debf7d0788b91e126ddb1622be27f1c76edade60b5a31b
3
+ size 3594917
mlm/pytorch_model_head.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ad03a83a7dae20b6f0cbf36a8063b17d8a1bc9c311e1aaa69385d1b16a8066b
3
+ size 370097519
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5961573ed8d14c1f436754dd9784d611c4c59aca2b69b6533123ff92fee84ee
3
+ size 11936581
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:373c00f5caea17a3661aaf0bf53fb559e5d31184ff8fa45fd88e5a14d98c6dbb
3
+ size 14575
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a74271501810f2b6d0d27ae7c5e032ef1ab90b92dac7c1a79a051174b3fa8163
3
+ size 627
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": false,
47
+ "mask_token": "[MASK]",
48
+ "model_max_length": 512,
49
+ "pad_token": "[PAD]",
50
+ "sep_token": "[SEP]",
51
+ "strip_accents": null,
52
+ "tokenize_chinese_chars": true,
53
+ "tokenizer_class": "BertTokenizer",
54
+ "unk_token": "[UNK]"
55
+ }
trainer_state.json ADDED
@@ -0,0 +1,1099 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.4452793300151825,
3
+ "best_model_checkpoint": "./models/adapters_mlm_cn/bg/checkpoint-36000",
4
+ "epoch": 11.022657685241887,
5
+ "eval_steps": 500,
6
+ "global_step": 36000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.15,
13
+ "learning_rate": 4.9500000000000004e-05,
14
+ "loss": 1.5057,
15
+ "step": 500
16
+ },
17
+ {
18
+ "epoch": 0.15,
19
+ "eval_accuracy": 0.8148620791204476,
20
+ "eval_loss": 0.9846400618553162,
21
+ "eval_runtime": 7.7173,
22
+ "eval_samples_per_second": 752.339,
23
+ "eval_steps_per_second": 47.037,
24
+ "step": 500
25
+ },
26
+ {
27
+ "epoch": 0.31,
28
+ "learning_rate": 4.9e-05,
29
+ "loss": 1.0172,
30
+ "step": 1000
31
+ },
32
+ {
33
+ "epoch": 0.31,
34
+ "eval_accuracy": 0.82587890625,
35
+ "eval_loss": 0.8394753932952881,
36
+ "eval_runtime": 7.7319,
37
+ "eval_samples_per_second": 750.918,
38
+ "eval_steps_per_second": 46.949,
39
+ "step": 1000
40
+ },
41
+ {
42
+ "epoch": 0.46,
43
+ "learning_rate": 4.85e-05,
44
+ "loss": 0.8814,
45
+ "step": 1500
46
+ },
47
+ {
48
+ "epoch": 0.46,
49
+ "eval_accuracy": 0.8368038740920097,
50
+ "eval_loss": 0.7822620272636414,
51
+ "eval_runtime": 7.7294,
52
+ "eval_samples_per_second": 751.162,
53
+ "eval_steps_per_second": 46.964,
54
+ "step": 1500
55
+ },
56
+ {
57
+ "epoch": 0.61,
58
+ "learning_rate": 4.8e-05,
59
+ "loss": 0.8405,
60
+ "step": 2000
61
+ },
62
+ {
63
+ "epoch": 0.61,
64
+ "eval_accuracy": 0.8449071800412533,
65
+ "eval_loss": 0.7436666488647461,
66
+ "eval_runtime": 7.9259,
67
+ "eval_samples_per_second": 732.538,
68
+ "eval_steps_per_second": 45.799,
69
+ "step": 2000
70
+ },
71
+ {
72
+ "epoch": 0.77,
73
+ "learning_rate": 4.75e-05,
74
+ "loss": 0.7773,
75
+ "step": 2500
76
+ },
77
+ {
78
+ "epoch": 0.77,
79
+ "eval_accuracy": 0.8386841062227507,
80
+ "eval_loss": 0.7246997952461243,
81
+ "eval_runtime": 7.7331,
82
+ "eval_samples_per_second": 750.803,
83
+ "eval_steps_per_second": 46.941,
84
+ "step": 2500
85
+ },
86
+ {
87
+ "epoch": 0.92,
88
+ "learning_rate": 4.7e-05,
89
+ "loss": 0.7762,
90
+ "step": 3000
91
+ },
92
+ {
93
+ "epoch": 0.92,
94
+ "eval_accuracy": 0.8513044340839202,
95
+ "eval_loss": 0.6520901322364807,
96
+ "eval_runtime": 7.9369,
97
+ "eval_samples_per_second": 731.516,
98
+ "eval_steps_per_second": 45.736,
99
+ "step": 3000
100
+ },
101
+ {
102
+ "epoch": 1.07,
103
+ "learning_rate": 4.6500000000000005e-05,
104
+ "loss": 0.7186,
105
+ "step": 3500
106
+ },
107
+ {
108
+ "epoch": 1.07,
109
+ "eval_accuracy": 0.8492265517916585,
110
+ "eval_loss": 0.6834315061569214,
111
+ "eval_runtime": 7.6738,
112
+ "eval_samples_per_second": 756.596,
113
+ "eval_steps_per_second": 47.304,
114
+ "step": 3500
115
+ },
116
+ {
117
+ "epoch": 1.22,
118
+ "learning_rate": 4.600000000000001e-05,
119
+ "loss": 0.7033,
120
+ "step": 4000
121
+ },
122
+ {
123
+ "epoch": 1.22,
124
+ "eval_accuracy": 0.852271607371637,
125
+ "eval_loss": 0.67154860496521,
126
+ "eval_runtime": 7.7294,
127
+ "eval_samples_per_second": 751.163,
128
+ "eval_steps_per_second": 46.964,
129
+ "step": 4000
130
+ },
131
+ {
132
+ "epoch": 1.38,
133
+ "learning_rate": 4.55e-05,
134
+ "loss": 0.672,
135
+ "step": 4500
136
+ },
137
+ {
138
+ "epoch": 1.38,
139
+ "eval_accuracy": 0.855973974763407,
140
+ "eval_loss": 0.6539207100868225,
141
+ "eval_runtime": 7.7117,
142
+ "eval_samples_per_second": 752.881,
143
+ "eval_steps_per_second": 47.071,
144
+ "step": 4500
145
+ },
146
+ {
147
+ "epoch": 1.53,
148
+ "learning_rate": 4.5e-05,
149
+ "loss": 0.6613,
150
+ "step": 5000
151
+ },
152
+ {
153
+ "epoch": 1.53,
154
+ "eval_accuracy": 0.8567085131424088,
155
+ "eval_loss": 0.638721227645874,
156
+ "eval_runtime": 7.6505,
157
+ "eval_samples_per_second": 758.9,
158
+ "eval_steps_per_second": 47.448,
159
+ "step": 5000
160
+ },
161
+ {
162
+ "epoch": 1.68,
163
+ "learning_rate": 4.4500000000000004e-05,
164
+ "loss": 0.6712,
165
+ "step": 5500
166
+ },
167
+ {
168
+ "epoch": 1.68,
169
+ "eval_accuracy": 0.862372613040467,
170
+ "eval_loss": 0.6180465221405029,
171
+ "eval_runtime": 7.7012,
172
+ "eval_samples_per_second": 753.913,
173
+ "eval_steps_per_second": 47.136,
174
+ "step": 5500
175
+ },
176
+ {
177
+ "epoch": 1.84,
178
+ "learning_rate": 4.4000000000000006e-05,
179
+ "loss": 0.6776,
180
+ "step": 6000
181
+ },
182
+ {
183
+ "epoch": 1.84,
184
+ "eval_accuracy": 0.8537038849202466,
185
+ "eval_loss": 0.6634594202041626,
186
+ "eval_runtime": 7.7042,
187
+ "eval_samples_per_second": 753.61,
188
+ "eval_steps_per_second": 47.117,
189
+ "step": 6000
190
+ },
191
+ {
192
+ "epoch": 1.99,
193
+ "learning_rate": 4.35e-05,
194
+ "loss": 0.6484,
195
+ "step": 6500
196
+ },
197
+ {
198
+ "epoch": 1.99,
199
+ "eval_accuracy": 0.8661394258933802,
200
+ "eval_loss": 0.5945894122123718,
201
+ "eval_runtime": 7.6974,
202
+ "eval_samples_per_second": 754.283,
203
+ "eval_steps_per_second": 47.159,
204
+ "step": 6500
205
+ },
206
+ {
207
+ "epoch": 2.14,
208
+ "learning_rate": 4.3e-05,
209
+ "loss": 0.6817,
210
+ "step": 7000
211
+ },
212
+ {
213
+ "epoch": 2.14,
214
+ "eval_accuracy": 0.8654563297350344,
215
+ "eval_loss": 0.6126104593276978,
216
+ "eval_runtime": 8.509,
217
+ "eval_samples_per_second": 682.334,
218
+ "eval_steps_per_second": 42.661,
219
+ "step": 7000
220
+ },
221
+ {
222
+ "epoch": 2.3,
223
+ "learning_rate": 4.25e-05,
224
+ "loss": 0.6392,
225
+ "step": 7500
226
+ },
227
+ {
228
+ "epoch": 2.3,
229
+ "eval_accuracy": 0.8613216715257531,
230
+ "eval_loss": 0.613590657711029,
231
+ "eval_runtime": 8.1378,
232
+ "eval_samples_per_second": 713.458,
233
+ "eval_steps_per_second": 44.606,
234
+ "step": 7500
235
+ },
236
+ {
237
+ "epoch": 2.45,
238
+ "learning_rate": 4.2e-05,
239
+ "loss": 0.6394,
240
+ "step": 8000
241
+ },
242
+ {
243
+ "epoch": 2.45,
244
+ "eval_accuracy": 0.8620723749258453,
245
+ "eval_loss": 0.6320650577545166,
246
+ "eval_runtime": 7.7697,
247
+ "eval_samples_per_second": 747.26,
248
+ "eval_steps_per_second": 46.72,
249
+ "step": 8000
250
+ },
251
+ {
252
+ "epoch": 2.6,
253
+ "learning_rate": 4.15e-05,
254
+ "loss": 0.6273,
255
+ "step": 8500
256
+ },
257
+ {
258
+ "epoch": 2.6,
259
+ "eval_accuracy": 0.8629402009560043,
260
+ "eval_loss": 0.5997043251991272,
261
+ "eval_runtime": 7.9947,
262
+ "eval_samples_per_second": 726.232,
263
+ "eval_steps_per_second": 45.405,
264
+ "step": 8500
265
+ },
266
+ {
267
+ "epoch": 2.76,
268
+ "learning_rate": 4.1e-05,
269
+ "loss": 0.5993,
270
+ "step": 9000
271
+ },
272
+ {
273
+ "epoch": 2.76,
274
+ "eval_accuracy": 0.8645569620253165,
275
+ "eval_loss": 0.6027613282203674,
276
+ "eval_runtime": 8.0195,
277
+ "eval_samples_per_second": 723.989,
278
+ "eval_steps_per_second": 45.265,
279
+ "step": 9000
280
+ },
281
+ {
282
+ "epoch": 2.91,
283
+ "learning_rate": 4.05e-05,
284
+ "loss": 0.6527,
285
+ "step": 9500
286
+ },
287
+ {
288
+ "epoch": 2.91,
289
+ "eval_accuracy": 0.8510214250124564,
290
+ "eval_loss": 0.6583752632141113,
291
+ "eval_runtime": 7.9195,
292
+ "eval_samples_per_second": 733.131,
293
+ "eval_steps_per_second": 45.836,
294
+ "step": 9500
295
+ },
296
+ {
297
+ "epoch": 3.06,
298
+ "learning_rate": 4e-05,
299
+ "loss": 0.5897,
300
+ "step": 10000
301
+ },
302
+ {
303
+ "epoch": 3.06,
304
+ "eval_accuracy": 0.8676120587068623,
305
+ "eval_loss": 0.5727556943893433,
306
+ "eval_runtime": 7.9746,
307
+ "eval_samples_per_second": 728.065,
308
+ "eval_steps_per_second": 45.52,
309
+ "step": 10000
310
+ },
311
+ {
312
+ "epoch": 3.21,
313
+ "learning_rate": 3.9500000000000005e-05,
314
+ "loss": 0.574,
315
+ "step": 10500
316
+ },
317
+ {
318
+ "epoch": 3.21,
319
+ "eval_accuracy": 0.8670824400701618,
320
+ "eval_loss": 0.5869864821434021,
321
+ "eval_runtime": 7.8716,
322
+ "eval_samples_per_second": 737.59,
323
+ "eval_steps_per_second": 46.115,
324
+ "step": 10500
325
+ },
326
+ {
327
+ "epoch": 3.37,
328
+ "learning_rate": 3.9000000000000006e-05,
329
+ "loss": 0.6026,
330
+ "step": 11000
331
+ },
332
+ {
333
+ "epoch": 3.37,
334
+ "eval_accuracy": 0.8676513458361675,
335
+ "eval_loss": 0.6066599488258362,
336
+ "eval_runtime": 7.8242,
337
+ "eval_samples_per_second": 742.057,
338
+ "eval_steps_per_second": 46.395,
339
+ "step": 11000
340
+ },
341
+ {
342
+ "epoch": 3.52,
343
+ "learning_rate": 3.85e-05,
344
+ "loss": 0.5896,
345
+ "step": 11500
346
+ },
347
+ {
348
+ "epoch": 3.52,
349
+ "eval_accuracy": 0.8638327806250629,
350
+ "eval_loss": 0.6000019311904907,
351
+ "eval_runtime": 8.0139,
352
+ "eval_samples_per_second": 724.49,
353
+ "eval_steps_per_second": 45.296,
354
+ "step": 11500
355
+ },
356
+ {
357
+ "epoch": 3.67,
358
+ "learning_rate": 3.8e-05,
359
+ "loss": 0.566,
360
+ "step": 12000
361
+ },
362
+ {
363
+ "epoch": 3.67,
364
+ "eval_accuracy": 0.8711821948164563,
365
+ "eval_loss": 0.5566375851631165,
366
+ "eval_runtime": 7.8868,
367
+ "eval_samples_per_second": 736.17,
368
+ "eval_steps_per_second": 46.026,
369
+ "step": 12000
370
+ },
371
+ {
372
+ "epoch": 3.83,
373
+ "learning_rate": 3.7500000000000003e-05,
374
+ "loss": 0.5928,
375
+ "step": 12500
376
+ },
377
+ {
378
+ "epoch": 3.83,
379
+ "eval_accuracy": 0.8675352877307275,
380
+ "eval_loss": 0.5621004700660706,
381
+ "eval_runtime": 7.9912,
382
+ "eval_samples_per_second": 726.553,
383
+ "eval_steps_per_second": 45.425,
384
+ "step": 12500
385
+ },
386
+ {
387
+ "epoch": 3.98,
388
+ "learning_rate": 3.7e-05,
389
+ "loss": 0.597,
390
+ "step": 13000
391
+ },
392
+ {
393
+ "epoch": 3.98,
394
+ "eval_accuracy": 0.8771320904403015,
395
+ "eval_loss": 0.5161893963813782,
396
+ "eval_runtime": 7.9771,
397
+ "eval_samples_per_second": 727.833,
398
+ "eval_steps_per_second": 45.505,
399
+ "step": 13000
400
+ },
401
+ {
402
+ "epoch": 4.13,
403
+ "learning_rate": 3.65e-05,
404
+ "loss": 0.5836,
405
+ "step": 13500
406
+ },
407
+ {
408
+ "epoch": 4.13,
409
+ "eval_accuracy": 0.8696463654223968,
410
+ "eval_loss": 0.5498046278953552,
411
+ "eval_runtime": 7.8463,
412
+ "eval_samples_per_second": 739.966,
413
+ "eval_steps_per_second": 46.264,
414
+ "step": 13500
415
+ },
416
+ {
417
+ "epoch": 4.29,
418
+ "learning_rate": 3.6e-05,
419
+ "loss": 0.5864,
420
+ "step": 14000
421
+ },
422
+ {
423
+ "epoch": 4.29,
424
+ "eval_accuracy": 0.8639773945240183,
425
+ "eval_loss": 0.5728442072868347,
426
+ "eval_runtime": 7.8404,
427
+ "eval_samples_per_second": 740.524,
428
+ "eval_steps_per_second": 46.299,
429
+ "step": 14000
430
+ },
431
+ {
432
+ "epoch": 4.44,
433
+ "learning_rate": 3.55e-05,
434
+ "loss": 0.5562,
435
+ "step": 14500
436
+ },
437
+ {
438
+ "epoch": 4.44,
439
+ "eval_accuracy": 0.8623497479643273,
440
+ "eval_loss": 0.6000498533248901,
441
+ "eval_runtime": 7.8135,
442
+ "eval_samples_per_second": 743.077,
443
+ "eval_steps_per_second": 46.458,
444
+ "step": 14500
445
+ },
446
+ {
447
+ "epoch": 4.59,
448
+ "learning_rate": 3.5e-05,
449
+ "loss": 0.5999,
450
+ "step": 15000
451
+ },
452
+ {
453
+ "epoch": 4.59,
454
+ "eval_accuracy": 0.8679152291769344,
455
+ "eval_loss": 0.5589025020599365,
456
+ "eval_runtime": 7.7959,
457
+ "eval_samples_per_second": 744.749,
458
+ "eval_steps_per_second": 46.563,
459
+ "step": 15000
460
+ },
461
+ {
462
+ "epoch": 4.75,
463
+ "learning_rate": 3.45e-05,
464
+ "loss": 0.5767,
465
+ "step": 15500
466
+ },
467
+ {
468
+ "epoch": 4.75,
469
+ "eval_accuracy": 0.8680821783151479,
470
+ "eval_loss": 0.5713112354278564,
471
+ "eval_runtime": 8.9874,
472
+ "eval_samples_per_second": 646.014,
473
+ "eval_steps_per_second": 40.39,
474
+ "step": 15500
475
+ },
476
+ {
477
+ "epoch": 4.9,
478
+ "learning_rate": 3.4000000000000007e-05,
479
+ "loss": 0.5574,
480
+ "step": 16000
481
+ },
482
+ {
483
+ "epoch": 4.9,
484
+ "eval_accuracy": 0.8739122026687295,
485
+ "eval_loss": 0.5337920784950256,
486
+ "eval_runtime": 10.8383,
487
+ "eval_samples_per_second": 535.691,
488
+ "eval_steps_per_second": 33.492,
489
+ "step": 16000
490
+ },
491
+ {
492
+ "epoch": 5.05,
493
+ "learning_rate": 3.35e-05,
494
+ "loss": 0.568,
495
+ "step": 16500
496
+ },
497
+ {
498
+ "epoch": 5.05,
499
+ "eval_accuracy": 0.87250098000784,
500
+ "eval_loss": 0.552727222442627,
501
+ "eval_runtime": 7.8124,
502
+ "eval_samples_per_second": 743.18,
503
+ "eval_steps_per_second": 46.465,
504
+ "step": 16500
505
+ },
506
+ {
507
+ "epoch": 5.21,
508
+ "learning_rate": 3.3e-05,
509
+ "loss": 0.5568,
510
+ "step": 17000
511
+ },
512
+ {
513
+ "epoch": 5.21,
514
+ "eval_accuracy": 0.8776927722971612,
515
+ "eval_loss": 0.5058096051216125,
516
+ "eval_runtime": 7.8143,
517
+ "eval_samples_per_second": 742.993,
518
+ "eval_steps_per_second": 46.453,
519
+ "step": 17000
520
+ },
521
+ {
522
+ "epoch": 5.36,
523
+ "learning_rate": 3.2500000000000004e-05,
524
+ "loss": 0.5369,
525
+ "step": 17500
526
+ },
527
+ {
528
+ "epoch": 5.36,
529
+ "eval_accuracy": 0.8719769673704415,
530
+ "eval_loss": 0.5599194169044495,
531
+ "eval_runtime": 7.8287,
532
+ "eval_samples_per_second": 741.628,
533
+ "eval_steps_per_second": 46.368,
534
+ "step": 17500
535
+ },
536
+ {
537
+ "epoch": 5.51,
538
+ "learning_rate": 3.2000000000000005e-05,
539
+ "loss": 0.518,
540
+ "step": 18000
541
+ },
542
+ {
543
+ "epoch": 5.51,
544
+ "eval_accuracy": 0.8720388349514563,
545
+ "eval_loss": 0.561033308506012,
546
+ "eval_runtime": 7.8241,
547
+ "eval_samples_per_second": 742.071,
548
+ "eval_steps_per_second": 46.395,
549
+ "step": 18000
550
+ },
551
+ {
552
+ "epoch": 5.66,
553
+ "learning_rate": 3.15e-05,
554
+ "loss": 0.5637,
555
+ "step": 18500
556
+ },
557
+ {
558
+ "epoch": 5.66,
559
+ "eval_accuracy": 0.8727518855153742,
560
+ "eval_loss": 0.5467284917831421,
561
+ "eval_runtime": 8.0155,
562
+ "eval_samples_per_second": 724.347,
563
+ "eval_steps_per_second": 45.287,
564
+ "step": 18500
565
+ },
566
+ {
567
+ "epoch": 5.82,
568
+ "learning_rate": 3.1e-05,
569
+ "loss": 0.557,
570
+ "step": 19000
571
+ },
572
+ {
573
+ "epoch": 5.82,
574
+ "eval_accuracy": 0.8713813872158539,
575
+ "eval_loss": 0.5348953604698181,
576
+ "eval_runtime": 8.0121,
577
+ "eval_samples_per_second": 724.653,
578
+ "eval_steps_per_second": 45.306,
579
+ "step": 19000
580
+ },
581
+ {
582
+ "epoch": 5.97,
583
+ "learning_rate": 3.05e-05,
584
+ "loss": 0.5499,
585
+ "step": 19500
586
+ },
587
+ {
588
+ "epoch": 5.97,
589
+ "eval_accuracy": 0.8724001160878398,
590
+ "eval_loss": 0.5467893481254578,
591
+ "eval_runtime": 7.7511,
592
+ "eval_samples_per_second": 749.05,
593
+ "eval_steps_per_second": 46.832,
594
+ "step": 19500
595
+ },
596
+ {
597
+ "epoch": 6.12,
598
+ "learning_rate": 3e-05,
599
+ "loss": 0.5304,
600
+ "step": 20000
601
+ },
602
+ {
603
+ "epoch": 6.12,
604
+ "eval_accuracy": 0.8740521910388971,
605
+ "eval_loss": 0.5243064761161804,
606
+ "eval_runtime": 7.8201,
607
+ "eval_samples_per_second": 742.45,
608
+ "eval_steps_per_second": 46.419,
609
+ "step": 20000
610
+ },
611
+ {
612
+ "epoch": 6.28,
613
+ "learning_rate": 2.95e-05,
614
+ "loss": 0.5431,
615
+ "step": 20500
616
+ },
617
+ {
618
+ "epoch": 6.28,
619
+ "eval_accuracy": 0.8783942176206291,
620
+ "eval_loss": 0.4997641146183014,
621
+ "eval_runtime": 8.0018,
622
+ "eval_samples_per_second": 725.585,
623
+ "eval_steps_per_second": 45.365,
624
+ "step": 20500
625
+ },
626
+ {
627
+ "epoch": 6.43,
628
+ "learning_rate": 2.9e-05,
629
+ "loss": 0.5508,
630
+ "step": 21000
631
+ },
632
+ {
633
+ "epoch": 6.43,
634
+ "eval_accuracy": 0.8763812154696132,
635
+ "eval_loss": 0.5366745591163635,
636
+ "eval_runtime": 7.8074,
637
+ "eval_samples_per_second": 743.654,
638
+ "eval_steps_per_second": 46.494,
639
+ "step": 21000
640
+ },
641
+ {
642
+ "epoch": 6.58,
643
+ "learning_rate": 2.8499999999999998e-05,
644
+ "loss": 0.5701,
645
+ "step": 21500
646
+ },
647
+ {
648
+ "epoch": 6.58,
649
+ "eval_accuracy": 0.8734250823803063,
650
+ "eval_loss": 0.5364522337913513,
651
+ "eval_runtime": 7.9868,
652
+ "eval_samples_per_second": 726.947,
653
+ "eval_steps_per_second": 45.45,
654
+ "step": 21500
655
+ },
656
+ {
657
+ "epoch": 6.74,
658
+ "learning_rate": 2.8000000000000003e-05,
659
+ "loss": 0.521,
660
+ "step": 22000
661
+ },
662
+ {
663
+ "epoch": 6.74,
664
+ "eval_accuracy": 0.8818635607321131,
665
+ "eval_loss": 0.4879148006439209,
666
+ "eval_runtime": 7.9938,
667
+ "eval_samples_per_second": 726.31,
668
+ "eval_steps_per_second": 45.41,
669
+ "step": 22000
670
+ },
671
+ {
672
+ "epoch": 6.89,
673
+ "learning_rate": 2.7500000000000004e-05,
674
+ "loss": 0.5514,
675
+ "step": 22500
676
+ },
677
+ {
678
+ "epoch": 6.89,
679
+ "eval_accuracy": 0.8786950074147306,
680
+ "eval_loss": 0.5105842351913452,
681
+ "eval_runtime": 7.8325,
682
+ "eval_samples_per_second": 741.269,
683
+ "eval_steps_per_second": 46.345,
684
+ "step": 22500
685
+ },
686
+ {
687
+ "epoch": 7.04,
688
+ "learning_rate": 2.7000000000000002e-05,
689
+ "loss": 0.547,
690
+ "step": 23000
691
+ },
692
+ {
693
+ "epoch": 7.04,
694
+ "eval_accuracy": 0.8747058823529412,
695
+ "eval_loss": 0.5258113741874695,
696
+ "eval_runtime": 7.8237,
697
+ "eval_samples_per_second": 742.1,
698
+ "eval_steps_per_second": 46.397,
699
+ "step": 23000
700
+ },
701
+ {
702
+ "epoch": 7.2,
703
+ "learning_rate": 2.6500000000000004e-05,
704
+ "loss": 0.5512,
705
+ "step": 23500
706
+ },
707
+ {
708
+ "epoch": 7.2,
709
+ "eval_accuracy": 0.877830692973078,
710
+ "eval_loss": 0.49750423431396484,
711
+ "eval_runtime": 7.9086,
712
+ "eval_samples_per_second": 734.135,
713
+ "eval_steps_per_second": 45.899,
714
+ "step": 23500
715
+ },
716
+ {
717
+ "epoch": 7.35,
718
+ "learning_rate": 2.6000000000000002e-05,
719
+ "loss": 0.5407,
720
+ "step": 24000
721
+ },
722
+ {
723
+ "epoch": 7.35,
724
+ "eval_accuracy": 0.8785601265822784,
725
+ "eval_loss": 0.494391530752182,
726
+ "eval_runtime": 8.2168,
727
+ "eval_samples_per_second": 706.599,
728
+ "eval_steps_per_second": 44.178,
729
+ "step": 24000
730
+ },
731
+ {
732
+ "epoch": 7.5,
733
+ "learning_rate": 2.5500000000000003e-05,
734
+ "loss": 0.5181,
735
+ "step": 24500
736
+ },
737
+ {
738
+ "epoch": 7.5,
739
+ "eval_accuracy": 0.8794734275962945,
740
+ "eval_loss": 0.4911736845970154,
741
+ "eval_runtime": 8.2044,
742
+ "eval_samples_per_second": 707.673,
743
+ "eval_steps_per_second": 44.245,
744
+ "step": 24500
745
+ },
746
+ {
747
+ "epoch": 7.65,
748
+ "learning_rate": 2.5e-05,
749
+ "loss": 0.5493,
750
+ "step": 25000
751
+ },
752
+ {
753
+ "epoch": 7.65,
754
+ "eval_accuracy": 0.87302207462395,
755
+ "eval_loss": 0.5187950730323792,
756
+ "eval_runtime": 8.0486,
757
+ "eval_samples_per_second": 721.366,
758
+ "eval_steps_per_second": 45.101,
759
+ "step": 25000
760
+ },
761
+ {
762
+ "epoch": 7.81,
763
+ "learning_rate": 2.45e-05,
764
+ "loss": 0.5388,
765
+ "step": 25500
766
+ },
767
+ {
768
+ "epoch": 7.81,
769
+ "eval_accuracy": 0.8831105473704751,
770
+ "eval_loss": 0.5000073313713074,
771
+ "eval_runtime": 8.0362,
772
+ "eval_samples_per_second": 722.481,
773
+ "eval_steps_per_second": 45.171,
774
+ "step": 25500
775
+ },
776
+ {
777
+ "epoch": 7.96,
778
+ "learning_rate": 2.4e-05,
779
+ "loss": 0.5284,
780
+ "step": 26000
781
+ },
782
+ {
783
+ "epoch": 7.96,
784
+ "eval_accuracy": 0.8737309019221291,
785
+ "eval_loss": 0.5161213278770447,
786
+ "eval_runtime": 8.1271,
787
+ "eval_samples_per_second": 714.401,
788
+ "eval_steps_per_second": 44.665,
789
+ "step": 26000
790
+ },
791
+ {
792
+ "epoch": 8.11,
793
+ "learning_rate": 2.35e-05,
794
+ "loss": 0.5116,
795
+ "step": 26500
796
+ },
797
+ {
798
+ "epoch": 8.11,
799
+ "eval_accuracy": 0.8759842519685039,
800
+ "eval_loss": 0.5262829065322876,
801
+ "eval_runtime": 8.1731,
802
+ "eval_samples_per_second": 710.381,
803
+ "eval_steps_per_second": 44.414,
804
+ "step": 26500
805
+ },
806
+ {
807
+ "epoch": 8.27,
808
+ "learning_rate": 2.3000000000000003e-05,
809
+ "loss": 0.5161,
810
+ "step": 27000
811
+ },
812
+ {
813
+ "epoch": 8.27,
814
+ "eval_accuracy": 0.8786888397577621,
815
+ "eval_loss": 0.500228762626648,
816
+ "eval_runtime": 8.034,
817
+ "eval_samples_per_second": 722.681,
818
+ "eval_steps_per_second": 45.183,
819
+ "step": 27000
820
+ },
821
+ {
822
+ "epoch": 8.42,
823
+ "learning_rate": 2.25e-05,
824
+ "loss": 0.5185,
825
+ "step": 27500
826
+ },
827
+ {
828
+ "epoch": 8.42,
829
+ "eval_accuracy": 0.8744550138723741,
830
+ "eval_loss": 0.5127227902412415,
831
+ "eval_runtime": 7.7182,
832
+ "eval_samples_per_second": 752.252,
833
+ "eval_steps_per_second": 47.032,
834
+ "step": 27500
835
+ },
836
+ {
837
+ "epoch": 8.57,
838
+ "learning_rate": 2.2000000000000003e-05,
839
+ "loss": 0.5291,
840
+ "step": 28000
841
+ },
842
+ {
843
+ "epoch": 8.57,
844
+ "eval_accuracy": 0.8782496527088708,
845
+ "eval_loss": 0.5115563273429871,
846
+ "eval_runtime": 8.0802,
847
+ "eval_samples_per_second": 718.543,
848
+ "eval_steps_per_second": 44.924,
849
+ "step": 28000
850
+ },
851
+ {
852
+ "epoch": 8.73,
853
+ "learning_rate": 2.15e-05,
854
+ "loss": 0.5061,
855
+ "step": 28500
856
+ },
857
+ {
858
+ "epoch": 8.73,
859
+ "eval_accuracy": 0.8773942634905202,
860
+ "eval_loss": 0.4972003996372223,
861
+ "eval_runtime": 7.7937,
862
+ "eval_samples_per_second": 744.959,
863
+ "eval_steps_per_second": 46.576,
864
+ "step": 28500
865
+ },
866
+ {
867
+ "epoch": 8.88,
868
+ "learning_rate": 2.1e-05,
869
+ "loss": 0.479,
870
+ "step": 29000
871
+ },
872
+ {
873
+ "epoch": 8.88,
874
+ "eval_accuracy": 0.8797814207650273,
875
+ "eval_loss": 0.49780747294425964,
876
+ "eval_runtime": 7.8838,
877
+ "eval_samples_per_second": 736.449,
878
+ "eval_steps_per_second": 46.044,
879
+ "step": 29000
880
+ },
881
+ {
882
+ "epoch": 9.03,
883
+ "learning_rate": 2.05e-05,
884
+ "loss": 0.5154,
885
+ "step": 29500
886
+ },
887
+ {
888
+ "epoch": 9.03,
889
+ "eval_accuracy": 0.877119904790241,
890
+ "eval_loss": 0.5088150501251221,
891
+ "eval_runtime": 7.7019,
892
+ "eval_samples_per_second": 753.843,
893
+ "eval_steps_per_second": 47.131,
894
+ "step": 29500
895
+ },
896
+ {
897
+ "epoch": 9.19,
898
+ "learning_rate": 2e-05,
899
+ "loss": 0.4989,
900
+ "step": 30000
901
+ },
902
+ {
903
+ "epoch": 9.19,
904
+ "eval_accuracy": 0.8744332741967278,
905
+ "eval_loss": 0.5118668079376221,
906
+ "eval_runtime": 7.7316,
907
+ "eval_samples_per_second": 750.942,
908
+ "eval_steps_per_second": 46.95,
909
+ "step": 30000
910
+ },
911
+ {
912
+ "epoch": 9.34,
913
+ "learning_rate": 1.9500000000000003e-05,
914
+ "loss": 0.5098,
915
+ "step": 30500
916
+ },
917
+ {
918
+ "epoch": 9.34,
919
+ "eval_accuracy": 0.8825979176995538,
920
+ "eval_loss": 0.4915599524974823,
921
+ "eval_runtime": 7.7036,
922
+ "eval_samples_per_second": 753.676,
923
+ "eval_steps_per_second": 47.121,
924
+ "step": 30500
925
+ },
926
+ {
927
+ "epoch": 9.49,
928
+ "learning_rate": 1.9e-05,
929
+ "loss": 0.4777,
930
+ "step": 31000
931
+ },
932
+ {
933
+ "epoch": 9.49,
934
+ "eval_accuracy": 0.882445449184195,
935
+ "eval_loss": 0.49568039178848267,
936
+ "eval_runtime": 7.7025,
937
+ "eval_samples_per_second": 753.779,
938
+ "eval_steps_per_second": 47.127,
939
+ "step": 31000
940
+ },
941
+ {
942
+ "epoch": 9.64,
943
+ "learning_rate": 1.85e-05,
944
+ "loss": 0.5462,
945
+ "step": 31500
946
+ },
947
+ {
948
+ "epoch": 9.64,
949
+ "eval_accuracy": 0.8778625954198473,
950
+ "eval_loss": 0.48457399010658264,
951
+ "eval_runtime": 7.7115,
952
+ "eval_samples_per_second": 752.903,
953
+ "eval_steps_per_second": 47.073,
954
+ "step": 31500
955
+ },
956
+ {
957
+ "epoch": 9.8,
958
+ "learning_rate": 1.8e-05,
959
+ "loss": 0.509,
960
+ "step": 32000
961
+ },
962
+ {
963
+ "epoch": 9.8,
964
+ "eval_accuracy": 0.8810146190337884,
965
+ "eval_loss": 0.48734790086746216,
966
+ "eval_runtime": 7.7302,
967
+ "eval_samples_per_second": 751.078,
968
+ "eval_steps_per_second": 46.959,
969
+ "step": 32000
970
+ },
971
+ {
972
+ "epoch": 9.95,
973
+ "learning_rate": 1.75e-05,
974
+ "loss": 0.5181,
975
+ "step": 32500
976
+ },
977
+ {
978
+ "epoch": 9.95,
979
+ "eval_accuracy": 0.8710217755443886,
980
+ "eval_loss": 0.5227355360984802,
981
+ "eval_runtime": 7.7073,
982
+ "eval_samples_per_second": 753.31,
983
+ "eval_steps_per_second": 47.098,
984
+ "step": 32500
985
+ },
986
+ {
987
+ "epoch": 10.1,
988
+ "learning_rate": 1.7000000000000003e-05,
989
+ "loss": 0.5269,
990
+ "step": 33000
991
+ },
992
+ {
993
+ "epoch": 10.1,
994
+ "eval_accuracy": 0.8802636757182212,
995
+ "eval_loss": 0.49287834763526917,
996
+ "eval_runtime": 8.3473,
997
+ "eval_samples_per_second": 695.551,
998
+ "eval_steps_per_second": 43.487,
999
+ "step": 33000
1000
+ },
1001
+ {
1002
+ "epoch": 10.26,
1003
+ "learning_rate": 1.65e-05,
1004
+ "loss": 0.5094,
1005
+ "step": 33500
1006
+ },
1007
+ {
1008
+ "epoch": 10.26,
1009
+ "eval_accuracy": 0.8877481840193705,
1010
+ "eval_loss": 0.4840761125087738,
1011
+ "eval_runtime": 8.4693,
1012
+ "eval_samples_per_second": 685.535,
1013
+ "eval_steps_per_second": 42.861,
1014
+ "step": 33500
1015
+ },
1016
+ {
1017
+ "epoch": 10.41,
1018
+ "learning_rate": 1.6000000000000003e-05,
1019
+ "loss": 0.5033,
1020
+ "step": 34000
1021
+ },
1022
+ {
1023
+ "epoch": 10.41,
1024
+ "eval_accuracy": 0.8805490654205608,
1025
+ "eval_loss": 0.5128547549247742,
1026
+ "eval_runtime": 8.1006,
1027
+ "eval_samples_per_second": 716.736,
1028
+ "eval_steps_per_second": 44.811,
1029
+ "step": 34000
1030
+ },
1031
+ {
1032
+ "epoch": 10.56,
1033
+ "learning_rate": 1.55e-05,
1034
+ "loss": 0.4913,
1035
+ "step": 34500
1036
+ },
1037
+ {
1038
+ "epoch": 10.56,
1039
+ "eval_accuracy": 0.8789432939810334,
1040
+ "eval_loss": 0.4978225529193878,
1041
+ "eval_runtime": 8.4845,
1042
+ "eval_samples_per_second": 684.304,
1043
+ "eval_steps_per_second": 42.784,
1044
+ "step": 34500
1045
+ },
1046
+ {
1047
+ "epoch": 10.72,
1048
+ "learning_rate": 1.5e-05,
1049
+ "loss": 0.4938,
1050
+ "step": 35000
1051
+ },
1052
+ {
1053
+ "epoch": 10.72,
1054
+ "eval_accuracy": 0.8838202465301368,
1055
+ "eval_loss": 0.46402791142463684,
1056
+ "eval_runtime": 8.2894,
1057
+ "eval_samples_per_second": 700.417,
1058
+ "eval_steps_per_second": 43.791,
1059
+ "step": 35000
1060
+ },
1061
+ {
1062
+ "epoch": 10.87,
1063
+ "learning_rate": 1.45e-05,
1064
+ "loss": 0.4954,
1065
+ "step": 35500
1066
+ },
1067
+ {
1068
+ "epoch": 10.87,
1069
+ "eval_accuracy": 0.8793576184880533,
1070
+ "eval_loss": 0.4990694522857666,
1071
+ "eval_runtime": 8.2824,
1072
+ "eval_samples_per_second": 701.003,
1073
+ "eval_steps_per_second": 43.828,
1074
+ "step": 35500
1075
+ },
1076
+ {
1077
+ "epoch": 11.02,
1078
+ "learning_rate": 1.4000000000000001e-05,
1079
+ "loss": 0.458,
1080
+ "step": 36000
1081
+ },
1082
+ {
1083
+ "epoch": 11.02,
1084
+ "eval_accuracy": 0.8885711468297012,
1085
+ "eval_loss": 0.4452793300151825,
1086
+ "eval_runtime": 8.4212,
1087
+ "eval_samples_per_second": 689.451,
1088
+ "eval_steps_per_second": 43.106,
1089
+ "step": 36000
1090
+ }
1091
+ ],
1092
+ "logging_steps": 500,
1093
+ "max_steps": 50000,
1094
+ "num_train_epochs": 16,
1095
+ "save_steps": 500,
1096
+ "total_flos": 5426535158775808.0,
1097
+ "trial_name": null,
1098
+ "trial_params": null
1099
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11847a8f7a409c8f6722c932668136e0c8a7072d934ba9c10d1f2aff07bd3ba4
3
+ size 4091
vocab.txt ADDED
The diff for this file is too large to render. See raw diff