zlucia commited on
Commit
9c27837
1 Parent(s): 5fa7ddf

Training in progress, step 100

Browse files
README.md CHANGED
@@ -4,19 +4,28 @@ library_name: peft
4
  tags:
5
  - generated_from_trainer
6
  base_model: mistralai/Mistral-7B-v0.1
 
 
7
  model-index:
8
- - name: Mistral-7B-v0.1_caselaw
9
  results: []
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
- # Mistral-7B-v0.1_caselaw
16
 
17
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 1.2884
 
 
 
 
 
 
 
20
 
21
  ## Model description
22
 
@@ -35,34 +44,50 @@ More information needed
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
- - learning_rate: 5e-05
39
  - train_batch_size: 4
40
  - eval_batch_size: 4
41
  - seed: 42
42
- - distributed_type: multi-GPU
43
- - num_devices: 4
44
  - gradient_accumulation_steps: 4
45
- - total_train_batch_size: 64
46
- - total_eval_batch_size: 16
47
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
  - lr_scheduler_type: constant
49
  - lr_scheduler_warmup_ratio: 0.03
50
- - num_epochs: 2.0
51
 
52
  ### Training results
53
 
54
- | Training Loss | Epoch | Step | Validation Loss |
55
- |:-------------:|:-----:|:----:|:---------------:|
56
- | 1.309 | 0.19 | 50 | 1.3631 |
57
- | 1.2966 | 0.39 | 100 | 1.3340 |
58
- | 1.2914 | 0.58 | 150 | 1.3168 |
59
- | 1.298 | 0.77 | 200 | 1.3039 |
60
- | 1.2678 | 0.96 | 250 | 1.2991 |
61
- | 1.216 | 1.16 | 300 | 1.3008 |
62
- | 1.2467 | 1.35 | 350 | 1.2945 |
63
- | 1.223 | 1.54 | 400 | 1.2940 |
64
- | 1.2 | 1.74 | 450 | 1.2924 |
65
- | 1.2406 | 1.93 | 500 | 1.2884 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
 
68
  ### Framework versions
 
4
  tags:
5
  - generated_from_trainer
6
  base_model: mistralai/Mistral-7B-v0.1
7
+ metrics:
8
+ - accuracy
9
  model-index:
10
+ - name: Mistral-7B-v0.1_district-court-db
11
  results: []
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
+ # Mistral-7B-v0.1_district-court-db
18
 
19
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.0358
22
+ - Precision Micro: 0.8142
23
+ - Precision Macro: 0.7222
24
+ - Recall Micro: 0.8142
25
+ - Recall Macro: 0.7126
26
+ - F1 Micro: 0.8142
27
+ - F1 Macro: 0.7098
28
+ - Accuracy: 0.8142
29
 
30
  ## Model description
31
 
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
47
+ - learning_rate: 3e-05
48
  - train_batch_size: 4
49
  - eval_batch_size: 4
50
  - seed: 42
 
 
51
  - gradient_accumulation_steps: 4
52
+ - total_train_batch_size: 16
 
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: constant
55
  - lr_scheduler_warmup_ratio: 0.03
56
+ - training_steps: 1450
57
 
58
  ### Training results
59
 
60
+ | Training Loss | Epoch | Step | Validation Loss | Precision Micro | Precision Macro | Recall Micro | Recall Macro | F1 Micro | F1 Macro | Accuracy |
61
+ |:-------------:|:-----:|:----:|:---------------:|:---------------:|:---------------:|:------------:|:------------:|:--------:|:--------:|:--------:|
62
+ | 0.1255 | 0.04 | 50 | 0.2459 | 0.2330 | 0.0980 | 0.2330 | 0.0939 | 0.2330 | 0.0773 | 0.2330 |
63
+ | 0.1076 | 0.08 | 100 | 0.1451 | 0.4075 | 0.1951 | 0.4075 | 0.1846 | 0.4075 | 0.1681 | 0.4075 |
64
+ | 0.066 | 0.12 | 150 | 0.1095 | 0.5387 | 0.3493 | 0.5387 | 0.2872 | 0.5387 | 0.2780 | 0.5387 |
65
+ | 0.0699 | 0.16 | 200 | 0.0901 | 0.6208 | 0.3837 | 0.6208 | 0.3992 | 0.6208 | 0.3798 | 0.6208 |
66
+ | 0.066 | 0.2 | 250 | 0.0883 | 0.6104 | 0.4544 | 0.6104 | 0.4312 | 0.6104 | 0.4135 | 0.6104 |
67
+ | 0.0452 | 0.24 | 300 | 0.0879 | 0.6877 | 0.5649 | 0.6877 | 0.5135 | 0.6877 | 0.5092 | 0.6877 |
68
+ | 0.0545 | 0.28 | 350 | 0.0761 | 0.6764 | 0.5194 | 0.6764 | 0.5288 | 0.6764 | 0.5040 | 0.6764 |
69
+ | 0.0647 | 0.32 | 400 | 0.0665 | 0.7340 | 0.6193 | 0.7340 | 0.5252 | 0.7340 | 0.5493 | 0.7340 |
70
+ | 0.056 | 0.36 | 450 | 0.0514 | 0.7396 | 0.6097 | 0.7396 | 0.5767 | 0.7396 | 0.5672 | 0.7396 |
71
+ | 0.0513 | 0.4 | 500 | 0.0479 | 0.7613 | 0.6384 | 0.7613 | 0.6145 | 0.7613 | 0.6020 | 0.7613 |
72
+ | 0.0501 | 0.44 | 550 | 0.0502 | 0.7509 | 0.6245 | 0.7509 | 0.6167 | 0.7509 | 0.6075 | 0.7509 |
73
+ | 0.0533 | 0.48 | 600 | 0.0481 | 0.7642 | 0.6500 | 0.7642 | 0.6139 | 0.7642 | 0.6073 | 0.7642 |
74
+ | 0.0462 | 0.52 | 650 | 0.0473 | 0.7481 | 0.5942 | 0.7481 | 0.5740 | 0.7481 | 0.5679 | 0.7481 |
75
+ | 0.0496 | 0.56 | 700 | 0.0419 | 0.7972 | 0.6678 | 0.7972 | 0.6480 | 0.7972 | 0.6518 | 0.7972 |
76
+ | 0.0614 | 0.6 | 750 | 0.0489 | 0.7774 | 0.6678 | 0.7774 | 0.6360 | 0.7774 | 0.6308 | 0.7774 |
77
+ | 0.0468 | 0.64 | 800 | 0.0443 | 0.7830 | 0.6435 | 0.7830 | 0.6816 | 0.7830 | 0.6494 | 0.7830 |
78
+ | 0.0477 | 0.68 | 850 | 0.0420 | 0.7972 | 0.7040 | 0.7972 | 0.6567 | 0.7972 | 0.6663 | 0.7972 |
79
+ | 0.0519 | 0.72 | 900 | 0.0463 | 0.7632 | 0.6519 | 0.7632 | 0.6291 | 0.7632 | 0.6292 | 0.7632 |
80
+ | 0.0453 | 0.76 | 950 | 0.0429 | 0.7802 | 0.6757 | 0.7802 | 0.6698 | 0.7802 | 0.6564 | 0.7802 |
81
+ | 0.0452 | 0.79 | 1000 | 0.0471 | 0.7377 | 0.6182 | 0.7377 | 0.6300 | 0.7377 | 0.6049 | 0.7377 |
82
+ | 0.0367 | 0.83 | 1050 | 0.0388 | 0.7981 | 0.6857 | 0.7981 | 0.6992 | 0.7981 | 0.6801 | 0.7981 |
83
+ | 0.0377 | 0.87 | 1100 | 0.0382 | 0.8 | 0.6636 | 0.8 | 0.6698 | 0.8000 | 0.6591 | 0.8 |
84
+ | 0.0429 | 0.91 | 1150 | 0.0398 | 0.7953 | 0.6924 | 0.7953 | 0.6441 | 0.7953 | 0.6466 | 0.7953 |
85
+ | 0.0451 | 0.95 | 1200 | 0.0378 | 0.7943 | 0.6713 | 0.7943 | 0.6538 | 0.7943 | 0.6535 | 0.7943 |
86
+ | 0.0347 | 0.99 | 1250 | 0.0413 | 0.7840 | 0.6735 | 0.7840 | 0.6450 | 0.7840 | 0.6331 | 0.7840 |
87
+ | 0.0378 | 1.03 | 1300 | 0.0377 | 0.8047 | 0.7109 | 0.8047 | 0.6387 | 0.8047 | 0.6489 | 0.8047 |
88
+ | 0.0357 | 1.07 | 1350 | 0.0386 | 0.8028 | 0.6899 | 0.8028 | 0.6559 | 0.8028 | 0.6649 | 0.8028 |
89
+ | 0.0418 | 1.11 | 1400 | 0.0368 | 0.7962 | 0.7114 | 0.7962 | 0.6942 | 0.7962 | 0.6910 | 0.7962 |
90
+ | 0.0293 | 1.15 | 1450 | 0.0358 | 0.8142 | 0.7222 | 0.8142 | 0.7126 | 0.8142 | 0.7098 | 0.8142 |
91
 
92
 
93
  ### Framework versions
adapter_config.json CHANGED
@@ -10,22 +10,22 @@
10
  "layers_to_transform": null,
11
  "loftq_config": {},
12
  "lora_alpha": 16,
13
- "lora_dropout": 0.05,
14
  "megatron_config": null,
15
  "megatron_core": "megatron.core",
16
  "modules_to_save": null,
17
  "peft_type": "LORA",
18
- "r": 16,
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
22
- "k_proj",
23
- "gate_proj",
24
  "up_proj",
 
25
  "v_proj",
26
- "o_proj",
27
- "q_proj",
28
- "down_proj"
29
  ],
30
  "task_type": "CAUSAL_LM"
31
  }
 
10
  "layers_to_transform": null,
11
  "loftq_config": {},
12
  "lora_alpha": 16,
13
+ "lora_dropout": 0.1,
14
  "megatron_config": null,
15
  "megatron_core": "megatron.core",
16
  "modules_to_save": null,
17
  "peft_type": "LORA",
18
+ "r": 64,
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
22
+ "down_proj",
23
+ "q_proj",
24
  "up_proj",
25
+ "gate_proj",
26
  "v_proj",
27
+ "k_proj",
28
+ "o_proj"
 
29
  ],
30
  "task_type": "CAUSAL_LM"
31
  }
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f8e5ecbf1b1c5f51d16a600bc209e8495214ac7d5f2a492bb2e9feae430537fc
3
- size 83946192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de172eb1895ddad38b31177541cc496db1488c1268ec9191549df799bb83dab1
3
+ size 335605144
all_results.json CHANGED
@@ -1,7 +1,18 @@
1
  {
2
- "epoch": 2.0,
3
- "train_loss": 1.2557248572124937,
4
- "train_runtime": 2763.6306,
5
- "train_samples_per_second": 12.007,
6
- "train_steps_per_second": 0.187
 
 
 
 
 
 
 
 
 
 
 
7
  }
 
1
  {
2
+ "epoch": 1.15,
3
+ "eval_accuracy": 0.8141509433962264,
4
+ "eval_f1_macro": 0.7097996478763092,
5
+ "eval_f1_micro": 0.8141509433962264,
6
+ "eval_loss": 0.035770244896411896,
7
+ "eval_precision_macro": 0.7222302630120379,
8
+ "eval_precision_micro": 0.8141509433962264,
9
+ "eval_recall_macro": 0.7125706602249756,
10
+ "eval_recall_micro": 0.8141509433962264,
11
+ "eval_runtime": 66.7734,
12
+ "eval_samples_per_second": 15.875,
13
+ "eval_steps_per_second": 3.969,
14
+ "train_loss": 0.07879953698865298,
15
+ "train_runtime": 5948.326,
16
+ "train_samples_per_second": 3.9,
17
+ "train_steps_per_second": 0.244
18
  }
eval_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.15,
3
+ "eval_accuracy": 0.8141509433962264,
4
+ "eval_f1_macro": 0.7097996478763092,
5
+ "eval_f1_micro": 0.8141509433962264,
6
+ "eval_loss": 0.035770244896411896,
7
+ "eval_precision_macro": 0.7222302630120379,
8
+ "eval_precision_micro": 0.8141509433962264,
9
+ "eval_recall_macro": 0.7125706602249756,
10
+ "eval_recall_micro": 0.8141509433962264,
11
+ "eval_runtime": 66.7734,
12
+ "eval_samples_per_second": 15.875,
13
+ "eval_steps_per_second": 3.969
14
+ }
metrics.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"run_name": "./output", "train_runtime": 5948.326, "train_samples_per_second": 3.9, "train_steps_per_second": 0.244, "train_loss": 0.07879953698865298, "epoch": 1.15, "eval_loss": 0.035770244896411896, "eval_precision_micro": 0.8141509433962264, "eval_precision_macro": 0.7222302630120379, "eval_recall_micro": 0.8141509433962264, "eval_recall_macro": 0.7125706602249756, "eval_f1_micro": 0.8141509433962264, "eval_f1_macro": 0.7097996478763092, "eval_accuracy": 0.8141509433962264, "eval_runtime": 66.7734, "eval_samples_per_second": 15.875, "eval_steps_per_second": 3.969}
train_results.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
- "epoch": 2.0,
3
- "train_loss": 1.2557248572124937,
4
- "train_runtime": 2763.6306,
5
- "train_samples_per_second": 12.007,
6
- "train_steps_per_second": 0.187
7
  }
 
1
  {
2
+ "epoch": 1.15,
3
+ "train_loss": 0.07879953698865298,
4
+ "train_runtime": 5948.326,
5
+ "train_samples_per_second": 3.9,
6
+ "train_steps_per_second": 0.244
7
  }
trainer_state.json CHANGED
@@ -1,415 +1,1334 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 1.9980713596914175,
5
  "eval_steps": 50,
6
- "global_step": 518,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.04,
13
- "learning_rate": 5e-05,
14
- "loss": 1.5247,
15
  "step": 10
16
  },
17
  {
18
- "epoch": 0.08,
19
- "learning_rate": 5e-05,
20
- "loss": 1.4114,
21
  "step": 20
22
  },
23
  {
24
- "epoch": 0.12,
25
- "learning_rate": 5e-05,
26
- "loss": 1.3203,
27
  "step": 30
28
  },
29
  {
30
- "epoch": 0.15,
31
- "learning_rate": 5e-05,
32
- "loss": 1.3566,
33
  "step": 40
34
  },
35
  {
36
- "epoch": 0.19,
37
- "learning_rate": 5e-05,
38
- "loss": 1.309,
39
  "step": 50
40
  },
41
  {
42
- "epoch": 0.19,
43
- "eval_loss": 1.3630590438842773,
44
- "eval_runtime": 26.2031,
45
- "eval_samples_per_second": 33.355,
46
- "eval_steps_per_second": 2.099,
 
 
 
 
 
 
 
47
  "step": 50
48
  },
49
  {
50
- "epoch": 0.23,
51
- "learning_rate": 5e-05,
52
- "loss": 1.2856,
53
  "step": 60
54
  },
55
  {
56
- "epoch": 0.27,
57
- "learning_rate": 5e-05,
58
- "loss": 1.3524,
59
  "step": 70
60
  },
61
  {
62
- "epoch": 0.31,
63
- "learning_rate": 5e-05,
64
- "loss": 1.317,
65
  "step": 80
66
  },
67
  {
68
- "epoch": 0.35,
69
- "learning_rate": 5e-05,
70
- "loss": 1.324,
71
  "step": 90
72
  },
73
  {
74
- "epoch": 0.39,
75
- "learning_rate": 5e-05,
76
- "loss": 1.2966,
77
  "step": 100
78
  },
79
  {
80
- "epoch": 0.39,
81
- "eval_loss": 1.3339532613754272,
82
- "eval_runtime": 25.949,
83
- "eval_samples_per_second": 33.681,
84
- "eval_steps_per_second": 2.12,
 
 
 
 
 
 
 
85
  "step": 100
86
  },
87
  {
88
- "epoch": 0.42,
89
- "learning_rate": 5e-05,
90
- "loss": 1.3073,
91
  "step": 110
92
  },
93
  {
94
- "epoch": 0.46,
95
- "learning_rate": 5e-05,
96
- "loss": 1.3162,
97
  "step": 120
98
  },
99
  {
100
- "epoch": 0.5,
101
- "learning_rate": 5e-05,
102
- "loss": 1.3299,
103
  "step": 130
104
  },
105
  {
106
- "epoch": 0.54,
107
- "learning_rate": 5e-05,
108
- "loss": 1.3271,
109
  "step": 140
110
  },
111
  {
112
- "epoch": 0.58,
113
- "learning_rate": 5e-05,
114
- "loss": 1.2914,
115
  "step": 150
116
  },
117
  {
118
- "epoch": 0.58,
119
- "eval_loss": 1.3168174028396606,
120
- "eval_runtime": 26.0065,
121
- "eval_samples_per_second": 33.607,
122
- "eval_steps_per_second": 2.115,
 
 
 
 
 
 
 
123
  "step": 150
124
  },
125
  {
126
- "epoch": 0.62,
127
- "learning_rate": 5e-05,
128
- "loss": 1.2526,
129
  "step": 160
130
  },
131
  {
132
- "epoch": 0.66,
133
- "learning_rate": 5e-05,
134
- "loss": 1.2924,
135
  "step": 170
136
  },
137
  {
138
- "epoch": 0.69,
139
- "learning_rate": 5e-05,
140
- "loss": 1.3303,
141
  "step": 180
142
  },
143
  {
144
- "epoch": 0.73,
145
- "learning_rate": 5e-05,
146
- "loss": 1.3173,
147
  "step": 190
148
  },
149
  {
150
- "epoch": 0.77,
151
- "learning_rate": 5e-05,
152
- "loss": 1.298,
153
  "step": 200
154
  },
155
  {
156
- "epoch": 0.77,
157
- "eval_loss": 1.303916096687317,
158
- "eval_runtime": 26.2236,
159
- "eval_samples_per_second": 33.329,
160
- "eval_steps_per_second": 2.097,
 
 
 
 
 
 
 
161
  "step": 200
162
  },
163
  {
164
- "epoch": 0.81,
165
- "learning_rate": 5e-05,
166
- "loss": 1.2402,
167
  "step": 210
168
  },
169
  {
170
- "epoch": 0.85,
171
- "learning_rate": 5e-05,
172
- "loss": 1.2768,
173
  "step": 220
174
  },
175
  {
176
- "epoch": 0.89,
177
- "learning_rate": 5e-05,
178
- "loss": 1.2929,
179
  "step": 230
180
  },
181
  {
182
- "epoch": 0.93,
183
- "learning_rate": 5e-05,
184
- "loss": 1.2744,
185
  "step": 240
186
  },
187
  {
188
- "epoch": 0.96,
189
- "learning_rate": 5e-05,
190
- "loss": 1.2678,
191
  "step": 250
192
  },
193
  {
194
- "epoch": 0.96,
195
- "eval_loss": 1.2991052865982056,
196
- "eval_runtime": 26.0876,
197
- "eval_samples_per_second": 33.502,
198
- "eval_steps_per_second": 2.108,
 
 
 
 
 
 
 
199
  "step": 250
200
  },
201
  {
202
- "epoch": 1.0,
203
- "learning_rate": 5e-05,
204
- "loss": 1.2506,
205
  "step": 260
206
  },
207
  {
208
- "epoch": 1.04,
209
- "learning_rate": 5e-05,
210
- "loss": 1.1717,
211
  "step": 270
212
  },
213
  {
214
- "epoch": 1.08,
215
- "learning_rate": 5e-05,
216
- "loss": 1.2022,
217
  "step": 280
218
  },
219
  {
220
- "epoch": 1.12,
221
- "learning_rate": 5e-05,
222
- "loss": 1.2237,
223
  "step": 290
224
  },
225
  {
226
- "epoch": 1.16,
227
- "learning_rate": 5e-05,
228
- "loss": 1.216,
229
  "step": 300
230
  },
231
  {
232
- "epoch": 1.16,
233
- "eval_loss": 1.3007503747940063,
234
- "eval_runtime": 28.1383,
235
- "eval_samples_per_second": 31.061,
236
- "eval_steps_per_second": 1.955,
 
 
 
 
 
 
 
237
  "step": 300
238
  },
239
  {
240
- "epoch": 1.2,
241
- "learning_rate": 5e-05,
242
- "loss": 1.2264,
243
  "step": 310
244
  },
245
  {
246
- "epoch": 1.23,
247
- "learning_rate": 5e-05,
248
- "loss": 1.1461,
249
  "step": 320
250
  },
251
  {
252
- "epoch": 1.27,
253
- "learning_rate": 5e-05,
254
- "loss": 1.1855,
255
  "step": 330
256
  },
257
  {
258
- "epoch": 1.31,
259
- "learning_rate": 5e-05,
260
- "loss": 1.2014,
261
  "step": 340
262
  },
263
  {
264
- "epoch": 1.35,
265
- "learning_rate": 5e-05,
266
- "loss": 1.2467,
267
  "step": 350
268
  },
269
  {
270
- "epoch": 1.35,
271
- "eval_loss": 1.2945101261138916,
272
- "eval_runtime": 25.9821,
273
- "eval_samples_per_second": 33.639,
274
- "eval_steps_per_second": 2.117,
 
 
 
 
 
 
 
275
  "step": 350
276
  },
277
  {
278
- "epoch": 1.39,
279
- "learning_rate": 5e-05,
280
- "loss": 1.2136,
281
  "step": 360
282
  },
283
  {
284
- "epoch": 1.43,
285
- "learning_rate": 5e-05,
286
- "loss": 1.1727,
287
  "step": 370
288
  },
289
  {
290
- "epoch": 1.47,
291
- "learning_rate": 5e-05,
292
- "loss": 1.192,
293
  "step": 380
294
  },
295
  {
296
- "epoch": 1.5,
297
- "learning_rate": 5e-05,
298
- "loss": 1.1963,
299
  "step": 390
300
  },
301
  {
302
- "epoch": 1.54,
303
- "learning_rate": 5e-05,
304
- "loss": 1.223,
305
  "step": 400
306
  },
307
  {
308
- "epoch": 1.54,
309
- "eval_loss": 1.293986201286316,
310
- "eval_runtime": 26.8556,
311
- "eval_samples_per_second": 32.544,
312
- "eval_steps_per_second": 2.048,
 
 
 
 
 
 
 
313
  "step": 400
314
  },
315
  {
316
- "epoch": 1.58,
317
- "learning_rate": 5e-05,
318
- "loss": 1.2076,
319
  "step": 410
320
  },
321
  {
322
- "epoch": 1.62,
323
- "learning_rate": 5e-05,
324
- "loss": 1.1373,
325
  "step": 420
326
  },
327
  {
328
- "epoch": 1.66,
329
- "learning_rate": 5e-05,
330
- "loss": 1.224,
331
  "step": 430
332
  },
333
  {
334
- "epoch": 1.7,
335
- "learning_rate": 5e-05,
336
- "loss": 1.1841,
337
  "step": 440
338
  },
339
  {
340
- "epoch": 1.74,
341
- "learning_rate": 5e-05,
342
- "loss": 1.2,
343
  "step": 450
344
  },
345
  {
346
- "epoch": 1.74,
347
- "eval_loss": 1.2924402952194214,
348
- "eval_runtime": 29.7855,
349
- "eval_samples_per_second": 29.343,
350
- "eval_steps_per_second": 1.847,
 
 
 
 
 
 
 
351
  "step": 450
352
  },
353
  {
354
- "epoch": 1.77,
355
- "learning_rate": 5e-05,
356
- "loss": 1.1896,
357
  "step": 460
358
  },
359
  {
360
- "epoch": 1.81,
361
- "learning_rate": 5e-05,
362
- "loss": 1.1476,
363
  "step": 470
364
  },
365
  {
366
- "epoch": 1.85,
367
- "learning_rate": 5e-05,
368
- "loss": 1.1986,
369
  "step": 480
370
  },
371
  {
372
- "epoch": 1.89,
373
- "learning_rate": 5e-05,
374
- "loss": 1.2185,
375
  "step": 490
376
  },
377
  {
378
- "epoch": 1.93,
379
- "learning_rate": 5e-05,
380
- "loss": 1.2406,
381
  "step": 500
382
  },
383
  {
384
- "epoch": 1.93,
385
- "eval_loss": 1.2884066104888916,
386
- "eval_runtime": 26.2052,
387
- "eval_samples_per_second": 33.352,
388
- "eval_steps_per_second": 2.099,
 
 
 
 
 
 
 
389
  "step": 500
390
  },
391
  {
392
- "epoch": 1.97,
393
- "learning_rate": 5e-05,
394
- "loss": 1.1962,
395
  "step": 510
396
  },
397
  {
398
- "epoch": 2.0,
399
- "step": 518,
400
- "total_flos": 5.99394721244119e+17,
401
- "train_loss": 1.2557248572124937,
402
- "train_runtime": 2763.6306,
403
- "train_samples_per_second": 12.007,
404
- "train_steps_per_second": 0.187
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
405
  }
406
  ],
407
  "logging_steps": 10,
408
- "max_steps": 518,
409
  "num_input_tokens_seen": 0,
410
  "num_train_epochs": 2,
411
- "save_steps": 100,
412
- "total_flos": 5.99394721244119e+17,
413
  "train_batch_size": 4,
414
  "trial_name": null,
415
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 1.1523941982912775,
5
  "eval_steps": 50,
6
+ "global_step": 1450,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.01,
13
+ "learning_rate": 3e-05,
14
+ "loss": 2.0927,
15
  "step": 10
16
  },
17
  {
18
+ "epoch": 0.02,
19
+ "learning_rate": 3e-05,
20
+ "loss": 0.267,
21
  "step": 20
22
  },
23
  {
24
+ "epoch": 0.02,
25
+ "learning_rate": 3e-05,
26
+ "loss": 0.1926,
27
  "step": 30
28
  },
29
  {
30
+ "epoch": 0.03,
31
+ "learning_rate": 3e-05,
32
+ "loss": 0.1601,
33
  "step": 40
34
  },
35
  {
36
+ "epoch": 0.04,
37
+ "learning_rate": 3e-05,
38
+ "loss": 0.1255,
39
  "step": 50
40
  },
41
  {
42
+ "epoch": 0.04,
43
+ "eval_accuracy": 0.2330188679245283,
44
+ "eval_f1_macro": 0.07731831394851697,
45
+ "eval_f1_micro": 0.2330188679245283,
46
+ "eval_loss": 0.24590551853179932,
47
+ "eval_precision_macro": 0.09801465635567799,
48
+ "eval_precision_micro": 0.2330188679245283,
49
+ "eval_recall_macro": 0.0939265996231733,
50
+ "eval_recall_micro": 0.2330188679245283,
51
+ "eval_runtime": 67.1714,
52
+ "eval_samples_per_second": 15.781,
53
+ "eval_steps_per_second": 3.945,
54
  "step": 50
55
  },
56
  {
57
+ "epoch": 0.05,
58
+ "learning_rate": 3e-05,
59
+ "loss": 0.6981,
60
  "step": 60
61
  },
62
  {
63
+ "epoch": 0.06,
64
+ "learning_rate": 3e-05,
65
+ "loss": 0.1356,
66
  "step": 70
67
  },
68
  {
69
+ "epoch": 0.06,
70
+ "learning_rate": 3e-05,
71
+ "loss": 0.0993,
72
  "step": 80
73
  },
74
  {
75
+ "epoch": 0.07,
76
+ "learning_rate": 3e-05,
77
+ "loss": 0.1038,
78
  "step": 90
79
  },
80
  {
81
+ "epoch": 0.08,
82
+ "learning_rate": 3e-05,
83
+ "loss": 0.1076,
84
  "step": 100
85
  },
86
  {
87
+ "epoch": 0.08,
88
+ "eval_accuracy": 0.4075471698113208,
89
+ "eval_f1_macro": 0.1681261191284492,
90
+ "eval_f1_micro": 0.4075471698113208,
91
+ "eval_loss": 0.14505280554294586,
92
+ "eval_precision_macro": 0.19505399860785297,
93
+ "eval_precision_micro": 0.4075471698113208,
94
+ "eval_recall_macro": 0.18462138174503467,
95
+ "eval_recall_micro": 0.4075471698113208,
96
+ "eval_runtime": 67.1009,
97
+ "eval_samples_per_second": 15.797,
98
+ "eval_steps_per_second": 3.949,
99
  "step": 100
100
  },
101
  {
102
+ "epoch": 0.09,
103
+ "learning_rate": 3e-05,
104
+ "loss": 0.331,
105
  "step": 110
106
  },
107
  {
108
+ "epoch": 0.1,
109
+ "learning_rate": 3e-05,
110
+ "loss": 0.0809,
111
  "step": 120
112
  },
113
  {
114
+ "epoch": 0.1,
115
+ "learning_rate": 3e-05,
116
+ "loss": 0.0812,
117
  "step": 130
118
  },
119
  {
120
+ "epoch": 0.11,
121
+ "learning_rate": 3e-05,
122
+ "loss": 0.0601,
123
  "step": 140
124
  },
125
  {
126
+ "epoch": 0.12,
127
+ "learning_rate": 3e-05,
128
+ "loss": 0.066,
129
  "step": 150
130
  },
131
  {
132
+ "epoch": 0.12,
133
+ "eval_accuracy": 0.5386792452830189,
134
+ "eval_f1_macro": 0.2780127225833117,
135
+ "eval_f1_micro": 0.5386792452830189,
136
+ "eval_loss": 0.10953618586063385,
137
+ "eval_precision_macro": 0.3493311966119182,
138
+ "eval_precision_micro": 0.5386792452830189,
139
+ "eval_recall_macro": 0.2871523900319283,
140
+ "eval_recall_micro": 0.5386792452830189,
141
+ "eval_runtime": 67.0903,
142
+ "eval_samples_per_second": 15.8,
143
+ "eval_steps_per_second": 3.95,
144
  "step": 150
145
  },
146
  {
147
+ "epoch": 0.13,
148
+ "learning_rate": 3e-05,
149
+ "loss": 0.2732,
150
  "step": 160
151
  },
152
  {
153
+ "epoch": 0.14,
154
+ "learning_rate": 3e-05,
155
+ "loss": 0.0754,
156
  "step": 170
157
  },
158
  {
159
+ "epoch": 0.14,
160
+ "learning_rate": 3e-05,
161
+ "loss": 0.0649,
162
  "step": 180
163
  },
164
  {
165
+ "epoch": 0.15,
166
+ "learning_rate": 3e-05,
167
+ "loss": 0.0674,
168
  "step": 190
169
  },
170
  {
171
+ "epoch": 0.16,
172
+ "learning_rate": 3e-05,
173
+ "loss": 0.0699,
174
  "step": 200
175
  },
176
  {
177
+ "epoch": 0.16,
178
+ "eval_accuracy": 0.620754716981132,
179
+ "eval_f1_macro": 0.3797608124202816,
180
+ "eval_f1_micro": 0.620754716981132,
181
+ "eval_loss": 0.09009388834238052,
182
+ "eval_precision_macro": 0.3837197141355178,
183
+ "eval_precision_micro": 0.620754716981132,
184
+ "eval_recall_macro": 0.39915842112719735,
185
+ "eval_recall_micro": 0.620754716981132,
186
+ "eval_runtime": 67.0671,
187
+ "eval_samples_per_second": 15.805,
188
+ "eval_steps_per_second": 3.951,
189
  "step": 200
190
  },
191
  {
192
+ "epoch": 0.17,
193
+ "learning_rate": 3e-05,
194
+ "loss": 0.1946,
195
  "step": 210
196
  },
197
  {
198
+ "epoch": 0.17,
199
+ "learning_rate": 3e-05,
200
+ "loss": 0.0657,
201
  "step": 220
202
  },
203
  {
204
+ "epoch": 0.18,
205
+ "learning_rate": 3e-05,
206
+ "loss": 0.0547,
207
  "step": 230
208
  },
209
  {
210
+ "epoch": 0.19,
211
+ "learning_rate": 3e-05,
212
+ "loss": 0.0615,
213
  "step": 240
214
  },
215
  {
216
+ "epoch": 0.2,
217
+ "learning_rate": 3e-05,
218
+ "loss": 0.066,
219
  "step": 250
220
  },
221
  {
222
+ "epoch": 0.2,
223
+ "eval_accuracy": 0.6103773584905661,
224
+ "eval_f1_macro": 0.41348516498355786,
225
+ "eval_f1_micro": 0.6103773584905661,
226
+ "eval_loss": 0.08832413703203201,
227
+ "eval_precision_macro": 0.45439839834135715,
228
+ "eval_precision_micro": 0.6103773584905661,
229
+ "eval_recall_macro": 0.4312111435526721,
230
+ "eval_recall_micro": 0.6103773584905661,
231
+ "eval_runtime": 66.9345,
232
+ "eval_samples_per_second": 15.836,
233
+ "eval_steps_per_second": 3.959,
234
  "step": 250
235
  },
236
  {
237
+ "epoch": 0.21,
238
+ "learning_rate": 3e-05,
239
+ "loss": 0.1494,
240
  "step": 260
241
  },
242
  {
243
+ "epoch": 0.21,
244
+ "learning_rate": 3e-05,
245
+ "loss": 0.0655,
246
  "step": 270
247
  },
248
  {
249
+ "epoch": 0.22,
250
+ "learning_rate": 3e-05,
251
+ "loss": 0.06,
252
  "step": 280
253
  },
254
  {
255
+ "epoch": 0.23,
256
+ "learning_rate": 3e-05,
257
+ "loss": 0.0616,
258
  "step": 290
259
  },
260
  {
261
+ "epoch": 0.24,
262
+ "learning_rate": 3e-05,
263
+ "loss": 0.0452,
264
  "step": 300
265
  },
266
  {
267
+ "epoch": 0.24,
268
+ "eval_accuracy": 0.6877358490566038,
269
+ "eval_f1_macro": 0.5091555575082085,
270
+ "eval_f1_micro": 0.6877358490566038,
271
+ "eval_loss": 0.08789286762475967,
272
+ "eval_precision_macro": 0.5649217974276287,
273
+ "eval_precision_micro": 0.6877358490566038,
274
+ "eval_recall_macro": 0.513496327466451,
275
+ "eval_recall_micro": 0.6877358490566038,
276
+ "eval_runtime": 67.1619,
277
+ "eval_samples_per_second": 15.783,
278
+ "eval_steps_per_second": 3.946,
279
  "step": 300
280
  },
281
  {
282
+ "epoch": 0.25,
283
+ "learning_rate": 3e-05,
284
+ "loss": 0.1535,
285
  "step": 310
286
  },
287
  {
288
+ "epoch": 0.25,
289
+ "learning_rate": 3e-05,
290
+ "loss": 0.0731,
291
  "step": 320
292
  },
293
  {
294
+ "epoch": 0.26,
295
+ "learning_rate": 3e-05,
296
+ "loss": 0.044,
297
  "step": 330
298
  },
299
  {
300
+ "epoch": 0.27,
301
+ "learning_rate": 3e-05,
302
+ "loss": 0.053,
303
  "step": 340
304
  },
305
  {
306
+ "epoch": 0.28,
307
+ "learning_rate": 3e-05,
308
+ "loss": 0.0545,
309
  "step": 350
310
  },
311
  {
312
+ "epoch": 0.28,
313
+ "eval_accuracy": 0.6764150943396227,
314
+ "eval_f1_macro": 0.503999030020007,
315
+ "eval_f1_micro": 0.6764150943396227,
316
+ "eval_loss": 0.07607663422822952,
317
+ "eval_precision_macro": 0.5194445629359009,
318
+ "eval_precision_micro": 0.6764150943396227,
319
+ "eval_recall_macro": 0.5287937722322651,
320
+ "eval_recall_micro": 0.6764150943396227,
321
+ "eval_runtime": 67.2353,
322
+ "eval_samples_per_second": 15.766,
323
+ "eval_steps_per_second": 3.941,
324
  "step": 350
325
  },
326
  {
327
+ "epoch": 0.29,
328
+ "learning_rate": 3e-05,
329
+ "loss": 0.1543,
330
  "step": 360
331
  },
332
  {
333
+ "epoch": 0.29,
334
+ "learning_rate": 3e-05,
335
+ "loss": 0.0609,
336
  "step": 370
337
  },
338
  {
339
+ "epoch": 0.3,
340
+ "learning_rate": 3e-05,
341
+ "loss": 0.0479,
342
  "step": 380
343
  },
344
  {
345
+ "epoch": 0.31,
346
+ "learning_rate": 3e-05,
347
+ "loss": 0.0532,
348
  "step": 390
349
  },
350
  {
351
+ "epoch": 0.32,
352
+ "learning_rate": 3e-05,
353
+ "loss": 0.0647,
354
  "step": 400
355
  },
356
  {
357
+ "epoch": 0.32,
358
+ "eval_accuracy": 0.7339622641509433,
359
+ "eval_f1_macro": 0.5492932704438783,
360
+ "eval_f1_micro": 0.7339622641509433,
361
+ "eval_loss": 0.06653406471014023,
362
+ "eval_precision_macro": 0.6193164476598846,
363
+ "eval_precision_micro": 0.7339622641509433,
364
+ "eval_recall_macro": 0.5252411264940735,
365
+ "eval_recall_micro": 0.7339622641509433,
366
+ "eval_runtime": 67.4334,
367
+ "eval_samples_per_second": 15.719,
368
+ "eval_steps_per_second": 3.93,
369
  "step": 400
370
  },
371
  {
372
+ "epoch": 0.33,
373
+ "learning_rate": 3e-05,
374
+ "loss": 0.104,
375
  "step": 410
376
  },
377
  {
378
+ "epoch": 0.33,
379
+ "learning_rate": 3e-05,
380
+ "loss": 0.0458,
381
  "step": 420
382
  },
383
  {
384
+ "epoch": 0.34,
385
+ "learning_rate": 3e-05,
386
+ "loss": 0.0552,
387
  "step": 430
388
  },
389
  {
390
+ "epoch": 0.35,
391
+ "learning_rate": 3e-05,
392
+ "loss": 0.0512,
393
  "step": 440
394
  },
395
  {
396
+ "epoch": 0.36,
397
+ "learning_rate": 3e-05,
398
+ "loss": 0.056,
399
  "step": 450
400
  },
401
  {
402
+ "epoch": 0.36,
403
+ "eval_accuracy": 0.7396226415094339,
404
+ "eval_f1_macro": 0.5671730153967399,
405
+ "eval_f1_micro": 0.7396226415094339,
406
+ "eval_loss": 0.05136344954371452,
407
+ "eval_precision_macro": 0.6096698581228938,
408
+ "eval_precision_micro": 0.7396226415094339,
409
+ "eval_recall_macro": 0.5767264087198709,
410
+ "eval_recall_micro": 0.7396226415094339,
411
+ "eval_runtime": 66.8962,
412
+ "eval_samples_per_second": 15.845,
413
+ "eval_steps_per_second": 3.961,
414
  "step": 450
415
  },
416
  {
417
+ "epoch": 0.37,
418
+ "learning_rate": 3e-05,
419
+ "loss": 0.0773,
420
  "step": 460
421
  },
422
  {
423
+ "epoch": 0.37,
424
+ "learning_rate": 3e-05,
425
+ "loss": 0.0474,
426
  "step": 470
427
  },
428
  {
429
+ "epoch": 0.38,
430
+ "learning_rate": 3e-05,
431
+ "loss": 0.0405,
432
  "step": 480
433
  },
434
  {
435
+ "epoch": 0.39,
436
+ "learning_rate": 3e-05,
437
+ "loss": 0.0461,
438
  "step": 490
439
  },
440
  {
441
+ "epoch": 0.4,
442
+ "learning_rate": 3e-05,
443
+ "loss": 0.0513,
444
  "step": 500
445
  },
446
  {
447
+ "epoch": 0.4,
448
+ "eval_accuracy": 0.7613207547169811,
449
+ "eval_f1_macro": 0.601977568492687,
450
+ "eval_f1_micro": 0.761320754716981,
451
+ "eval_loss": 0.047934673726558685,
452
+ "eval_precision_macro": 0.638418606498986,
453
+ "eval_precision_micro": 0.7613207547169811,
454
+ "eval_recall_macro": 0.6145296570629574,
455
+ "eval_recall_micro": 0.7613207547169811,
456
+ "eval_runtime": 66.9411,
457
+ "eval_samples_per_second": 15.835,
458
+ "eval_steps_per_second": 3.959,
459
  "step": 500
460
  },
461
  {
462
+ "epoch": 0.41,
463
+ "learning_rate": 3e-05,
464
+ "loss": 0.0788,
465
  "step": 510
466
  },
467
  {
468
+ "epoch": 0.41,
469
+ "learning_rate": 3e-05,
470
+ "loss": 0.0495,
471
+ "step": 520
472
+ },
473
+ {
474
+ "epoch": 0.42,
475
+ "learning_rate": 3e-05,
476
+ "loss": 0.0552,
477
+ "step": 530
478
+ },
479
+ {
480
+ "epoch": 0.43,
481
+ "learning_rate": 3e-05,
482
+ "loss": 0.0415,
483
+ "step": 540
484
+ },
485
+ {
486
+ "epoch": 0.44,
487
+ "learning_rate": 3e-05,
488
+ "loss": 0.0501,
489
+ "step": 550
490
+ },
491
+ {
492
+ "epoch": 0.44,
493
+ "eval_accuracy": 0.7509433962264151,
494
+ "eval_f1_macro": 0.6074975120648255,
495
+ "eval_f1_micro": 0.7509433962264151,
496
+ "eval_loss": 0.05019384250044823,
497
+ "eval_precision_macro": 0.624502704252128,
498
+ "eval_precision_micro": 0.7509433962264151,
499
+ "eval_recall_macro": 0.6167049341328479,
500
+ "eval_recall_micro": 0.7509433962264151,
501
+ "eval_runtime": 67.3498,
502
+ "eval_samples_per_second": 15.739,
503
+ "eval_steps_per_second": 3.935,
504
+ "step": 550
505
+ },
506
+ {
507
+ "epoch": 0.45,
508
+ "learning_rate": 3e-05,
509
+ "loss": 0.0633,
510
+ "step": 560
511
+ },
512
+ {
513
+ "epoch": 0.45,
514
+ "learning_rate": 3e-05,
515
+ "loss": 0.0484,
516
+ "step": 570
517
+ },
518
+ {
519
+ "epoch": 0.46,
520
+ "learning_rate": 3e-05,
521
+ "loss": 0.0418,
522
+ "step": 580
523
+ },
524
+ {
525
+ "epoch": 0.47,
526
+ "learning_rate": 3e-05,
527
+ "loss": 0.0524,
528
+ "step": 590
529
+ },
530
+ {
531
+ "epoch": 0.48,
532
+ "learning_rate": 3e-05,
533
+ "loss": 0.0533,
534
+ "step": 600
535
+ },
536
+ {
537
+ "epoch": 0.48,
538
+ "eval_accuracy": 0.7641509433962265,
539
+ "eval_f1_macro": 0.607265930345707,
540
+ "eval_f1_micro": 0.7641509433962265,
541
+ "eval_loss": 0.048058342188596725,
542
+ "eval_precision_macro": 0.6499724898555727,
543
+ "eval_precision_micro": 0.7641509433962265,
544
+ "eval_recall_macro": 0.6139175086252339,
545
+ "eval_recall_micro": 0.7641509433962265,
546
+ "eval_runtime": 66.897,
547
+ "eval_samples_per_second": 15.845,
548
+ "eval_steps_per_second": 3.961,
549
+ "step": 600
550
+ },
551
+ {
552
+ "epoch": 0.48,
553
+ "learning_rate": 3e-05,
554
+ "loss": 0.0418,
555
+ "step": 610
556
+ },
557
+ {
558
+ "epoch": 0.49,
559
+ "learning_rate": 3e-05,
560
+ "loss": 0.0482,
561
+ "step": 620
562
+ },
563
+ {
564
+ "epoch": 0.5,
565
+ "learning_rate": 3e-05,
566
+ "loss": 0.0458,
567
+ "step": 630
568
+ },
569
+ {
570
+ "epoch": 0.51,
571
+ "learning_rate": 3e-05,
572
+ "loss": 0.0432,
573
+ "step": 640
574
+ },
575
+ {
576
+ "epoch": 0.52,
577
+ "learning_rate": 3e-05,
578
+ "loss": 0.0462,
579
+ "step": 650
580
+ },
581
+ {
582
+ "epoch": 0.52,
583
+ "eval_accuracy": 0.7481132075471698,
584
+ "eval_f1_macro": 0.5679477471859753,
585
+ "eval_f1_micro": 0.7481132075471698,
586
+ "eval_loss": 0.047320980578660965,
587
+ "eval_precision_macro": 0.5941670973495327,
588
+ "eval_precision_micro": 0.7481132075471698,
589
+ "eval_recall_macro": 0.5739727328111488,
590
+ "eval_recall_micro": 0.7481132075471698,
591
+ "eval_runtime": 67.2106,
592
+ "eval_samples_per_second": 15.771,
593
+ "eval_steps_per_second": 3.943,
594
+ "step": 650
595
+ },
596
+ {
597
+ "epoch": 0.52,
598
+ "learning_rate": 3e-05,
599
+ "loss": 0.0668,
600
+ "step": 660
601
+ },
602
+ {
603
+ "epoch": 0.53,
604
+ "learning_rate": 3e-05,
605
+ "loss": 0.0501,
606
+ "step": 670
607
+ },
608
+ {
609
+ "epoch": 0.54,
610
+ "learning_rate": 3e-05,
611
+ "loss": 0.0366,
612
+ "step": 680
613
+ },
614
+ {
615
+ "epoch": 0.55,
616
+ "learning_rate": 3e-05,
617
+ "loss": 0.0374,
618
+ "step": 690
619
+ },
620
+ {
621
+ "epoch": 0.56,
622
+ "learning_rate": 3e-05,
623
+ "loss": 0.0496,
624
+ "step": 700
625
+ },
626
+ {
627
+ "epoch": 0.56,
628
+ "eval_accuracy": 0.7971698113207547,
629
+ "eval_f1_macro": 0.6517694520426227,
630
+ "eval_f1_micro": 0.7971698113207546,
631
+ "eval_loss": 0.04193812981247902,
632
+ "eval_precision_macro": 0.6678204026981202,
633
+ "eval_precision_micro": 0.7971698113207547,
634
+ "eval_recall_macro": 0.6480125227888868,
635
+ "eval_recall_micro": 0.7971698113207547,
636
+ "eval_runtime": 67.3982,
637
+ "eval_samples_per_second": 15.727,
638
+ "eval_steps_per_second": 3.932,
639
+ "step": 700
640
+ },
641
+ {
642
+ "epoch": 0.56,
643
+ "learning_rate": 3e-05,
644
+ "loss": 0.0649,
645
+ "step": 710
646
+ },
647
+ {
648
+ "epoch": 0.57,
649
+ "learning_rate": 3e-05,
650
+ "loss": 0.0447,
651
+ "step": 720
652
+ },
653
+ {
654
+ "epoch": 0.58,
655
+ "learning_rate": 3e-05,
656
+ "loss": 0.0442,
657
+ "step": 730
658
+ },
659
+ {
660
+ "epoch": 0.59,
661
+ "learning_rate": 3e-05,
662
+ "loss": 0.037,
663
+ "step": 740
664
+ },
665
+ {
666
+ "epoch": 0.6,
667
+ "learning_rate": 3e-05,
668
+ "loss": 0.0614,
669
+ "step": 750
670
+ },
671
+ {
672
+ "epoch": 0.6,
673
+ "eval_accuracy": 0.7773584905660378,
674
+ "eval_f1_macro": 0.6308119664331103,
675
+ "eval_f1_micro": 0.7773584905660378,
676
+ "eval_loss": 0.04885416477918625,
677
+ "eval_precision_macro": 0.6677975283624125,
678
+ "eval_precision_micro": 0.7773584905660378,
679
+ "eval_recall_macro": 0.6360471775658058,
680
+ "eval_recall_micro": 0.7773584905660378,
681
+ "eval_runtime": 67.7832,
682
+ "eval_samples_per_second": 15.638,
683
+ "eval_steps_per_second": 3.91,
684
+ "step": 750
685
+ },
686
+ {
687
+ "epoch": 0.6,
688
+ "learning_rate": 3e-05,
689
+ "loss": 0.0649,
690
+ "step": 760
691
+ },
692
+ {
693
+ "epoch": 0.61,
694
+ "learning_rate": 3e-05,
695
+ "loss": 0.0426,
696
+ "step": 770
697
+ },
698
+ {
699
+ "epoch": 0.62,
700
+ "learning_rate": 3e-05,
701
+ "loss": 0.0347,
702
+ "step": 780
703
+ },
704
+ {
705
+ "epoch": 0.63,
706
+ "learning_rate": 3e-05,
707
+ "loss": 0.0414,
708
+ "step": 790
709
+ },
710
+ {
711
+ "epoch": 0.64,
712
+ "learning_rate": 3e-05,
713
+ "loss": 0.0468,
714
+ "step": 800
715
+ },
716
+ {
717
+ "epoch": 0.64,
718
+ "eval_accuracy": 0.7830188679245284,
719
+ "eval_f1_macro": 0.6493890925237205,
720
+ "eval_f1_micro": 0.7830188679245284,
721
+ "eval_loss": 0.044340912252664566,
722
+ "eval_precision_macro": 0.6435014283226803,
723
+ "eval_precision_micro": 0.7830188679245284,
724
+ "eval_recall_macro": 0.6816157451405587,
725
+ "eval_recall_micro": 0.7830188679245284,
726
+ "eval_runtime": 67.2351,
727
+ "eval_samples_per_second": 15.766,
728
+ "eval_steps_per_second": 3.941,
729
+ "step": 800
730
+ },
731
+ {
732
+ "epoch": 0.64,
733
+ "learning_rate": 3e-05,
734
+ "loss": 0.052,
735
+ "step": 810
736
+ },
737
+ {
738
+ "epoch": 0.65,
739
+ "learning_rate": 3e-05,
740
+ "loss": 0.0414,
741
+ "step": 820
742
+ },
743
+ {
744
+ "epoch": 0.66,
745
+ "learning_rate": 3e-05,
746
+ "loss": 0.0342,
747
+ "step": 830
748
+ },
749
+ {
750
+ "epoch": 0.67,
751
+ "learning_rate": 3e-05,
752
+ "loss": 0.0451,
753
+ "step": 840
754
+ },
755
+ {
756
+ "epoch": 0.68,
757
+ "learning_rate": 3e-05,
758
+ "loss": 0.0477,
759
+ "step": 850
760
+ },
761
+ {
762
+ "epoch": 0.68,
763
+ "eval_accuracy": 0.7971698113207547,
764
+ "eval_f1_macro": 0.6662808099368048,
765
+ "eval_f1_micro": 0.7971698113207546,
766
+ "eval_loss": 0.041995830833911896,
767
+ "eval_precision_macro": 0.7040157648486967,
768
+ "eval_precision_micro": 0.7971698113207547,
769
+ "eval_recall_macro": 0.6567342355863813,
770
+ "eval_recall_micro": 0.7971698113207547,
771
+ "eval_runtime": 67.3249,
772
+ "eval_samples_per_second": 15.745,
773
+ "eval_steps_per_second": 3.936,
774
+ "step": 850
775
+ },
776
+ {
777
+ "epoch": 0.68,
778
+ "learning_rate": 3e-05,
779
+ "loss": 0.0468,
780
+ "step": 860
781
+ },
782
+ {
783
+ "epoch": 0.69,
784
+ "learning_rate": 3e-05,
785
+ "loss": 0.0461,
786
+ "step": 870
787
+ },
788
+ {
789
+ "epoch": 0.7,
790
+ "learning_rate": 3e-05,
791
+ "loss": 0.0436,
792
+ "step": 880
793
+ },
794
+ {
795
+ "epoch": 0.71,
796
+ "learning_rate": 3e-05,
797
+ "loss": 0.0369,
798
+ "step": 890
799
+ },
800
+ {
801
+ "epoch": 0.72,
802
+ "learning_rate": 3e-05,
803
+ "loss": 0.0519,
804
+ "step": 900
805
+ },
806
+ {
807
+ "epoch": 0.72,
808
+ "eval_accuracy": 0.7632075471698113,
809
+ "eval_f1_macro": 0.6291599323302522,
810
+ "eval_f1_micro": 0.7632075471698113,
811
+ "eval_loss": 0.04627140238881111,
812
+ "eval_precision_macro": 0.6519385252086033,
813
+ "eval_precision_micro": 0.7632075471698113,
814
+ "eval_recall_macro": 0.6290591814696965,
815
+ "eval_recall_micro": 0.7632075471698113,
816
+ "eval_runtime": 67.0228,
817
+ "eval_samples_per_second": 15.816,
818
+ "eval_steps_per_second": 3.954,
819
+ "step": 900
820
+ },
821
+ {
822
+ "epoch": 0.72,
823
+ "learning_rate": 3e-05,
824
+ "loss": 0.0543,
825
+ "step": 910
826
+ },
827
+ {
828
+ "epoch": 0.73,
829
+ "learning_rate": 3e-05,
830
+ "loss": 0.0426,
831
+ "step": 920
832
+ },
833
+ {
834
+ "epoch": 0.74,
835
+ "learning_rate": 3e-05,
836
+ "loss": 0.0421,
837
+ "step": 930
838
+ },
839
+ {
840
+ "epoch": 0.75,
841
+ "learning_rate": 3e-05,
842
+ "loss": 0.0338,
843
+ "step": 940
844
+ },
845
+ {
846
+ "epoch": 0.76,
847
+ "learning_rate": 3e-05,
848
+ "loss": 0.0453,
849
+ "step": 950
850
+ },
851
+ {
852
+ "epoch": 0.76,
853
+ "eval_accuracy": 0.780188679245283,
854
+ "eval_f1_macro": 0.6564187596520696,
855
+ "eval_f1_micro": 0.780188679245283,
856
+ "eval_loss": 0.042860858142375946,
857
+ "eval_precision_macro": 0.67574812222591,
858
+ "eval_precision_micro": 0.780188679245283,
859
+ "eval_recall_macro": 0.6697872775950671,
860
+ "eval_recall_micro": 0.780188679245283,
861
+ "eval_runtime": 67.3483,
862
+ "eval_samples_per_second": 15.739,
863
+ "eval_steps_per_second": 3.935,
864
+ "step": 950
865
+ },
866
+ {
867
+ "epoch": 0.76,
868
+ "learning_rate": 3e-05,
869
+ "loss": 0.0554,
870
+ "step": 960
871
+ },
872
+ {
873
+ "epoch": 0.77,
874
+ "learning_rate": 3e-05,
875
+ "loss": 0.0397,
876
+ "step": 970
877
+ },
878
+ {
879
+ "epoch": 0.78,
880
+ "learning_rate": 3e-05,
881
+ "loss": 0.0407,
882
+ "step": 980
883
+ },
884
+ {
885
+ "epoch": 0.79,
886
+ "learning_rate": 3e-05,
887
+ "loss": 0.0361,
888
+ "step": 990
889
+ },
890
+ {
891
+ "epoch": 0.79,
892
+ "learning_rate": 3e-05,
893
+ "loss": 0.0452,
894
+ "step": 1000
895
+ },
896
+ {
897
+ "epoch": 0.79,
898
+ "eval_accuracy": 0.7377358490566037,
899
+ "eval_f1_macro": 0.6049285124615932,
900
+ "eval_f1_micro": 0.7377358490566037,
901
+ "eval_loss": 0.047125279903411865,
902
+ "eval_precision_macro": 0.6181852032037266,
903
+ "eval_precision_micro": 0.7377358490566037,
904
+ "eval_recall_macro": 0.6300074429793591,
905
+ "eval_recall_micro": 0.7377358490566037,
906
+ "eval_runtime": 66.8035,
907
+ "eval_samples_per_second": 15.867,
908
+ "eval_steps_per_second": 3.967,
909
+ "step": 1000
910
+ },
911
+ {
912
+ "epoch": 0.8,
913
+ "learning_rate": 3e-05,
914
+ "loss": 0.0482,
915
+ "step": 1010
916
+ },
917
+ {
918
+ "epoch": 0.81,
919
+ "learning_rate": 3e-05,
920
+ "loss": 0.0379,
921
+ "step": 1020
922
+ },
923
+ {
924
+ "epoch": 0.82,
925
+ "learning_rate": 3e-05,
926
+ "loss": 0.0403,
927
+ "step": 1030
928
+ },
929
+ {
930
+ "epoch": 0.83,
931
+ "learning_rate": 3e-05,
932
+ "loss": 0.0471,
933
+ "step": 1040
934
+ },
935
+ {
936
+ "epoch": 0.83,
937
+ "learning_rate": 3e-05,
938
+ "loss": 0.0367,
939
+ "step": 1050
940
+ },
941
+ {
942
+ "epoch": 0.83,
943
+ "eval_accuracy": 0.7981132075471699,
944
+ "eval_f1_macro": 0.6800660818700823,
945
+ "eval_f1_micro": 0.79811320754717,
946
+ "eval_loss": 0.03875497728586197,
947
+ "eval_precision_macro": 0.6856812225733196,
948
+ "eval_precision_micro": 0.7981132075471699,
949
+ "eval_recall_macro": 0.6992476720564776,
950
+ "eval_recall_micro": 0.7981132075471699,
951
+ "eval_runtime": 66.8444,
952
+ "eval_samples_per_second": 15.858,
953
+ "eval_steps_per_second": 3.964,
954
+ "step": 1050
955
+ },
956
+ {
957
+ "epoch": 0.84,
958
+ "learning_rate": 3e-05,
959
+ "loss": 0.0351,
960
+ "step": 1060
961
+ },
962
+ {
963
+ "epoch": 0.85,
964
+ "learning_rate": 3e-05,
965
+ "loss": 0.0479,
966
+ "step": 1070
967
+ },
968
+ {
969
+ "epoch": 0.86,
970
+ "learning_rate": 3e-05,
971
+ "loss": 0.0421,
972
+ "step": 1080
973
+ },
974
+ {
975
+ "epoch": 0.87,
976
+ "learning_rate": 3e-05,
977
+ "loss": 0.0406,
978
+ "step": 1090
979
+ },
980
+ {
981
+ "epoch": 0.87,
982
+ "learning_rate": 3e-05,
983
+ "loss": 0.0377,
984
+ "step": 1100
985
+ },
986
+ {
987
+ "epoch": 0.87,
988
+ "eval_accuracy": 0.8,
989
+ "eval_f1_macro": 0.6590911576508658,
990
+ "eval_f1_micro": 0.8000000000000002,
991
+ "eval_loss": 0.03815627098083496,
992
+ "eval_precision_macro": 0.6636349851737382,
993
+ "eval_precision_micro": 0.8,
994
+ "eval_recall_macro": 0.6697553358712118,
995
+ "eval_recall_micro": 0.8,
996
+ "eval_runtime": 66.9434,
997
+ "eval_samples_per_second": 15.834,
998
+ "eval_steps_per_second": 3.959,
999
+ "step": 1100
1000
+ },
1001
+ {
1002
+ "epoch": 0.88,
1003
+ "learning_rate": 3e-05,
1004
+ "loss": 0.0365,
1005
+ "step": 1110
1006
+ },
1007
+ {
1008
+ "epoch": 0.89,
1009
+ "learning_rate": 3e-05,
1010
+ "loss": 0.0353,
1011
+ "step": 1120
1012
+ },
1013
+ {
1014
+ "epoch": 0.9,
1015
+ "learning_rate": 3e-05,
1016
+ "loss": 0.0388,
1017
+ "step": 1130
1018
+ },
1019
+ {
1020
+ "epoch": 0.91,
1021
+ "learning_rate": 3e-05,
1022
+ "loss": 0.0358,
1023
+ "step": 1140
1024
+ },
1025
+ {
1026
+ "epoch": 0.91,
1027
+ "learning_rate": 3e-05,
1028
+ "loss": 0.0429,
1029
+ "step": 1150
1030
+ },
1031
+ {
1032
+ "epoch": 0.91,
1033
+ "eval_accuracy": 0.7952830188679245,
1034
+ "eval_f1_macro": 0.6465609013784224,
1035
+ "eval_f1_micro": 0.7952830188679245,
1036
+ "eval_loss": 0.03976297378540039,
1037
+ "eval_precision_macro": 0.6923924758215005,
1038
+ "eval_precision_micro": 0.7952830188679245,
1039
+ "eval_recall_macro": 0.6441492192889419,
1040
+ "eval_recall_micro": 0.7952830188679245,
1041
+ "eval_runtime": 67.1705,
1042
+ "eval_samples_per_second": 15.781,
1043
+ "eval_steps_per_second": 3.945,
1044
+ "step": 1150
1045
+ },
1046
+ {
1047
+ "epoch": 0.92,
1048
+ "learning_rate": 3e-05,
1049
+ "loss": 0.0461,
1050
+ "step": 1160
1051
+ },
1052
+ {
1053
+ "epoch": 0.93,
1054
+ "learning_rate": 3e-05,
1055
+ "loss": 0.0434,
1056
+ "step": 1170
1057
+ },
1058
+ {
1059
+ "epoch": 0.94,
1060
+ "learning_rate": 3e-05,
1061
+ "loss": 0.0524,
1062
+ "step": 1180
1063
+ },
1064
+ {
1065
+ "epoch": 0.95,
1066
+ "learning_rate": 3e-05,
1067
+ "loss": 0.0362,
1068
+ "step": 1190
1069
+ },
1070
+ {
1071
+ "epoch": 0.95,
1072
+ "learning_rate": 3e-05,
1073
+ "loss": 0.0451,
1074
+ "step": 1200
1075
+ },
1076
+ {
1077
+ "epoch": 0.95,
1078
+ "eval_accuracy": 0.7943396226415095,
1079
+ "eval_f1_macro": 0.6535399936575059,
1080
+ "eval_f1_micro": 0.7943396226415095,
1081
+ "eval_loss": 0.037755727767944336,
1082
+ "eval_precision_macro": 0.6712905678869693,
1083
+ "eval_precision_micro": 0.7943396226415095,
1084
+ "eval_recall_macro": 0.6537773538776073,
1085
+ "eval_recall_micro": 0.7943396226415095,
1086
+ "eval_runtime": 66.9611,
1087
+ "eval_samples_per_second": 15.83,
1088
+ "eval_steps_per_second": 3.958,
1089
+ "step": 1200
1090
+ },
1091
+ {
1092
+ "epoch": 0.96,
1093
+ "learning_rate": 3e-05,
1094
+ "loss": 0.0456,
1095
+ "step": 1210
1096
+ },
1097
+ {
1098
+ "epoch": 0.97,
1099
+ "learning_rate": 3e-05,
1100
+ "loss": 0.0455,
1101
+ "step": 1220
1102
+ },
1103
+ {
1104
+ "epoch": 0.98,
1105
+ "learning_rate": 3e-05,
1106
+ "loss": 0.0409,
1107
+ "step": 1230
1108
+ },
1109
+ {
1110
+ "epoch": 0.99,
1111
+ "learning_rate": 3e-05,
1112
+ "loss": 0.037,
1113
+ "step": 1240
1114
+ },
1115
+ {
1116
+ "epoch": 0.99,
1117
+ "learning_rate": 3e-05,
1118
+ "loss": 0.0347,
1119
+ "step": 1250
1120
+ },
1121
+ {
1122
+ "epoch": 0.99,
1123
+ "eval_accuracy": 0.7839622641509434,
1124
+ "eval_f1_macro": 0.6330944207402169,
1125
+ "eval_f1_micro": 0.7839622641509434,
1126
+ "eval_loss": 0.041340529918670654,
1127
+ "eval_precision_macro": 0.6735372413807635,
1128
+ "eval_precision_micro": 0.7839622641509434,
1129
+ "eval_recall_macro": 0.6450299050285588,
1130
+ "eval_recall_micro": 0.7839622641509434,
1131
+ "eval_runtime": 66.9053,
1132
+ "eval_samples_per_second": 15.843,
1133
+ "eval_steps_per_second": 3.961,
1134
+ "step": 1250
1135
+ },
1136
+ {
1137
+ "epoch": 1.0,
1138
+ "learning_rate": 3e-05,
1139
+ "loss": 0.0421,
1140
+ "step": 1260
1141
+ },
1142
+ {
1143
+ "epoch": 1.01,
1144
+ "learning_rate": 3e-05,
1145
+ "loss": 0.041,
1146
+ "step": 1270
1147
+ },
1148
+ {
1149
+ "epoch": 1.02,
1150
+ "learning_rate": 3e-05,
1151
+ "loss": 0.033,
1152
+ "step": 1280
1153
+ },
1154
+ {
1155
+ "epoch": 1.03,
1156
+ "learning_rate": 3e-05,
1157
+ "loss": 0.036,
1158
+ "step": 1290
1159
+ },
1160
+ {
1161
+ "epoch": 1.03,
1162
+ "learning_rate": 3e-05,
1163
+ "loss": 0.0378,
1164
+ "step": 1300
1165
+ },
1166
+ {
1167
+ "epoch": 1.03,
1168
+ "eval_accuracy": 0.8047169811320755,
1169
+ "eval_f1_macro": 0.6488791804614907,
1170
+ "eval_f1_micro": 0.8047169811320755,
1171
+ "eval_loss": 0.037683386355638504,
1172
+ "eval_precision_macro": 0.7109359814450084,
1173
+ "eval_precision_micro": 0.8047169811320755,
1174
+ "eval_recall_macro": 0.6387082579227776,
1175
+ "eval_recall_micro": 0.8047169811320755,
1176
+ "eval_runtime": 67.3206,
1177
+ "eval_samples_per_second": 15.746,
1178
+ "eval_steps_per_second": 3.936,
1179
+ "step": 1300
1180
+ },
1181
+ {
1182
+ "epoch": 1.04,
1183
+ "learning_rate": 3e-05,
1184
+ "loss": 0.0343,
1185
+ "step": 1310
1186
+ },
1187
+ {
1188
+ "epoch": 1.05,
1189
+ "learning_rate": 3e-05,
1190
+ "loss": 0.0321,
1191
+ "step": 1320
1192
+ },
1193
+ {
1194
+ "epoch": 1.06,
1195
+ "learning_rate": 3e-05,
1196
+ "loss": 0.031,
1197
+ "step": 1330
1198
+ },
1199
+ {
1200
+ "epoch": 1.06,
1201
+ "learning_rate": 3e-05,
1202
+ "loss": 0.039,
1203
+ "step": 1340
1204
+ },
1205
+ {
1206
+ "epoch": 1.07,
1207
+ "learning_rate": 3e-05,
1208
+ "loss": 0.0357,
1209
+ "step": 1350
1210
+ },
1211
+ {
1212
+ "epoch": 1.07,
1213
+ "eval_accuracy": 0.8028301886792453,
1214
+ "eval_f1_macro": 0.6648963473667772,
1215
+ "eval_f1_micro": 0.8028301886792453,
1216
+ "eval_loss": 0.03860827535390854,
1217
+ "eval_precision_macro": 0.6898539099210392,
1218
+ "eval_precision_micro": 0.8028301886792453,
1219
+ "eval_recall_macro": 0.6558796396655843,
1220
+ "eval_recall_micro": 0.8028301886792453,
1221
+ "eval_runtime": 67.0656,
1222
+ "eval_samples_per_second": 15.805,
1223
+ "eval_steps_per_second": 3.951,
1224
+ "step": 1350
1225
+ },
1226
+ {
1227
+ "epoch": 1.08,
1228
+ "learning_rate": 3e-05,
1229
+ "loss": 0.0445,
1230
+ "step": 1360
1231
+ },
1232
+ {
1233
+ "epoch": 1.09,
1234
+ "learning_rate": 3e-05,
1235
+ "loss": 0.0375,
1236
+ "step": 1370
1237
+ },
1238
+ {
1239
+ "epoch": 1.1,
1240
+ "learning_rate": 3e-05,
1241
+ "loss": 0.0375,
1242
+ "step": 1380
1243
+ },
1244
+ {
1245
+ "epoch": 1.1,
1246
+ "learning_rate": 3e-05,
1247
+ "loss": 0.0333,
1248
+ "step": 1390
1249
+ },
1250
+ {
1251
+ "epoch": 1.11,
1252
+ "learning_rate": 3e-05,
1253
+ "loss": 0.0418,
1254
+ "step": 1400
1255
+ },
1256
+ {
1257
+ "epoch": 1.11,
1258
+ "eval_accuracy": 0.7962264150943397,
1259
+ "eval_f1_macro": 0.6910242491250081,
1260
+ "eval_f1_micro": 0.7962264150943396,
1261
+ "eval_loss": 0.0368194542825222,
1262
+ "eval_precision_macro": 0.7114033533579757,
1263
+ "eval_precision_micro": 0.7962264150943397,
1264
+ "eval_recall_macro": 0.6942176996685531,
1265
+ "eval_recall_micro": 0.7962264150943397,
1266
+ "eval_runtime": 66.8832,
1267
+ "eval_samples_per_second": 15.849,
1268
+ "eval_steps_per_second": 3.962,
1269
+ "step": 1400
1270
+ },
1271
+ {
1272
+ "epoch": 1.12,
1273
+ "learning_rate": 3e-05,
1274
+ "loss": 0.0414,
1275
+ "step": 1410
1276
+ },
1277
+ {
1278
+ "epoch": 1.13,
1279
+ "learning_rate": 3e-05,
1280
+ "loss": 0.0357,
1281
+ "step": 1420
1282
+ },
1283
+ {
1284
+ "epoch": 1.14,
1285
+ "learning_rate": 3e-05,
1286
+ "loss": 0.0272,
1287
+ "step": 1430
1288
+ },
1289
+ {
1290
+ "epoch": 1.14,
1291
+ "learning_rate": 3e-05,
1292
+ "loss": 0.0323,
1293
+ "step": 1440
1294
+ },
1295
+ {
1296
+ "epoch": 1.15,
1297
+ "learning_rate": 3e-05,
1298
+ "loss": 0.0293,
1299
+ "step": 1450
1300
+ },
1301
+ {
1302
+ "epoch": 1.15,
1303
+ "eval_accuracy": 0.8141509433962264,
1304
+ "eval_f1_macro": 0.7097996478763092,
1305
+ "eval_f1_micro": 0.8141509433962264,
1306
+ "eval_loss": 0.035770244896411896,
1307
+ "eval_precision_macro": 0.7222302630120379,
1308
+ "eval_precision_micro": 0.8141509433962264,
1309
+ "eval_recall_macro": 0.7125706602249756,
1310
+ "eval_recall_micro": 0.8141509433962264,
1311
+ "eval_runtime": 67.0694,
1312
+ "eval_samples_per_second": 15.805,
1313
+ "eval_steps_per_second": 3.951,
1314
+ "step": 1450
1315
+ },
1316
+ {
1317
+ "epoch": 1.15,
1318
+ "step": 1450,
1319
+ "total_flos": 3.612646182806976e+17,
1320
+ "train_loss": 0.07879953698865298,
1321
+ "train_runtime": 5948.326,
1322
+ "train_samples_per_second": 3.9,
1323
+ "train_steps_per_second": 0.244
1324
  }
1325
  ],
1326
  "logging_steps": 10,
1327
+ "max_steps": 1450,
1328
  "num_input_tokens_seen": 0,
1329
  "num_train_epochs": 2,
1330
+ "save_steps": 250,
1331
+ "total_flos": 3.612646182806976e+17,
1332
  "train_batch_size": 4,
1333
  "trial_name": null,
1334
  "trial_params": null
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:da0882b299f95e3275abb6a23f877e7a3f1b30b0a7a738ee7aa9fdae78a96b6c
3
  size 6648
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:add2669eaaf689daa0607cb945e0a514be11a77fcad05794f011caf4efc79995
3
  size 6648