zlucia commited on
Commit
5b984da
1 Parent(s): ae91113

End of training

Browse files
README.md CHANGED
@@ -18,14 +18,14 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.0353
22
- - Precision Micro: 0.8057
23
- - Precision Macro: 0.7027
24
- - Recall Micro: 0.8057
25
- - Recall Macro: 0.6940
26
- - F1 Micro: 0.8057
27
- - F1 Macro: 0.6867
28
- - Accuracy: 0.8057
29
 
30
  ## Model description
31
 
@@ -45,70 +45,34 @@ More information needed
45
 
46
  The following hyperparameters were used during training:
47
  - learning_rate: 3e-05
48
- - train_batch_size: 4
49
  - eval_batch_size: 4
50
  - seed: 42
51
  - gradient_accumulation_steps: 4
52
- - total_train_batch_size: 16
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: constant
55
  - lr_scheduler_warmup_ratio: 0.03
56
- - num_epochs: 2.0
57
 
58
  ### Training results
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Precision Micro | Precision Macro | Recall Micro | Recall Macro | F1 Micro | F1 Macro | Accuracy |
61
  |:-------------:|:-----:|:----:|:---------------:|:---------------:|:---------------:|:------------:|:------------:|:--------:|:--------:|:--------:|
62
- | 0.1114 | 0.04 | 50 | 0.2315 | 0.3613 | 0.1236 | 0.3613 | 0.1194 | 0.3613 | 0.0986 | 0.3613 |
63
- | 0.1009 | 0.08 | 100 | 0.1300 | 0.4868 | 0.2602 | 0.4868 | 0.2927 | 0.4868 | 0.2574 | 0.4868 |
64
- | 0.0655 | 0.12 | 150 | 0.1111 | 0.5821 | 0.4592 | 0.5821 | 0.3260 | 0.5821 | 0.3407 | 0.5821 |
65
- | 0.0675 | 0.16 | 200 | 0.0980 | 0.6104 | 0.4309 | 0.6104 | 0.4116 | 0.6104 | 0.3994 | 0.6104 |
66
- | 0.0613 | 0.2 | 250 | 0.0868 | 0.6349 | 0.5027 | 0.6349 | 0.4238 | 0.6349 | 0.4328 | 0.6349 |
67
- | 0.0423 | 0.24 | 300 | 0.0829 | 0.6406 | 0.4971 | 0.6406 | 0.5150 | 0.6406 | 0.4838 | 0.6406 |
68
- | 0.0495 | 0.28 | 350 | 0.0647 | 0.6840 | 0.5621 | 0.6840 | 0.5110 | 0.6840 | 0.5118 | 0.6840 |
69
- | 0.0696 | 0.32 | 400 | 0.0583 | 0.7236 | 0.5854 | 0.7236 | 0.5476 | 0.7236 | 0.5523 | 0.7236 |
70
- | 0.0551 | 0.36 | 450 | 0.0470 | 0.7538 | 0.6037 | 0.7538 | 0.5804 | 0.7538 | 0.5801 | 0.7538 |
71
- | 0.0485 | 0.4 | 500 | 0.0467 | 0.7632 | 0.6244 | 0.7632 | 0.6093 | 0.7632 | 0.5976 | 0.7632 |
72
- | 0.0514 | 0.44 | 550 | 0.0491 | 0.7453 | 0.6624 | 0.7453 | 0.6055 | 0.7453 | 0.6149 | 0.7453 |
73
- | 0.0537 | 0.48 | 600 | 0.0469 | 0.7547 | 0.6565 | 0.7547 | 0.6140 | 0.7547 | 0.5956 | 0.7547 |
74
- | 0.0503 | 0.52 | 650 | 0.0473 | 0.7434 | 0.6365 | 0.7434 | 0.5711 | 0.7434 | 0.5711 | 0.7434 |
75
- | 0.0502 | 0.56 | 700 | 0.0429 | 0.7991 | 0.6675 | 0.7991 | 0.6430 | 0.7991 | 0.6487 | 0.7991 |
76
- | 0.0568 | 0.6 | 750 | 0.0421 | 0.7830 | 0.6400 | 0.7830 | 0.6197 | 0.7830 | 0.6035 | 0.7830 |
77
- | 0.0456 | 0.64 | 800 | 0.0385 | 0.8038 | 0.6660 | 0.8038 | 0.7100 | 0.8038 | 0.6795 | 0.8038 |
78
- | 0.0465 | 0.68 | 850 | 0.0423 | 0.7868 | 0.7080 | 0.7868 | 0.6536 | 0.7868 | 0.6638 | 0.7868 |
79
- | 0.0517 | 0.72 | 900 | 0.0405 | 0.7830 | 0.6482 | 0.7830 | 0.5953 | 0.7830 | 0.6044 | 0.7830 |
80
- | 0.0449 | 0.76 | 950 | 0.0395 | 0.7962 | 0.6783 | 0.7962 | 0.6782 | 0.7962 | 0.6595 | 0.7962 |
81
- | 0.0438 | 0.79 | 1000 | 0.0415 | 0.7651 | 0.6310 | 0.7651 | 0.6519 | 0.7651 | 0.6270 | 0.7651 |
82
- | 0.0368 | 0.83 | 1050 | 0.0367 | 0.8142 | 0.7077 | 0.8142 | 0.6998 | 0.8142 | 0.6885 | 0.8142 |
83
- | 0.0351 | 0.87 | 1100 | 0.0350 | 0.8151 | 0.6864 | 0.8151 | 0.6838 | 0.8151 | 0.6796 | 0.8151 |
84
- | 0.042 | 0.91 | 1150 | 0.0362 | 0.8066 | 0.6895 | 0.8066 | 0.6593 | 0.8066 | 0.6627 | 0.8066 |
85
- | 0.0449 | 0.95 | 1200 | 0.0367 | 0.7925 | 0.6685 | 0.7925 | 0.6671 | 0.7925 | 0.6583 | 0.7925 |
86
- | 0.0331 | 0.99 | 1250 | 0.0382 | 0.8019 | 0.6760 | 0.8019 | 0.6848 | 0.8019 | 0.6661 | 0.8019 |
87
- | 0.0367 | 1.03 | 1300 | 0.0372 | 0.8038 | 0.7119 | 0.8038 | 0.6501 | 0.8038 | 0.6590 | 0.8038 |
88
- | 0.0357 | 1.07 | 1350 | 0.0375 | 0.7991 | 0.6822 | 0.7991 | 0.6657 | 0.7991 | 0.6639 | 0.7991 |
89
- | 0.0405 | 1.11 | 1400 | 0.0354 | 0.8104 | 0.6735 | 0.8104 | 0.7011 | 0.8104 | 0.6823 | 0.8104 |
90
- | 0.0281 | 1.15 | 1450 | 0.0338 | 0.8302 | 0.6881 | 0.8302 | 0.7082 | 0.8302 | 0.6937 | 0.8302 |
91
- | 0.0362 | 1.19 | 1500 | 0.0351 | 0.8123 | 0.7044 | 0.8123 | 0.6559 | 0.8123 | 0.6607 | 0.8123 |
92
- | 0.0214 | 1.23 | 1550 | 0.0350 | 0.8104 | 0.7081 | 0.8104 | 0.6749 | 0.8104 | 0.6779 | 0.8104 |
93
- | 0.0321 | 1.27 | 1600 | 0.0368 | 0.8094 | 0.7541 | 0.8094 | 0.7254 | 0.8094 | 0.7278 | 0.8094 |
94
- | 0.0332 | 1.31 | 1650 | 0.0339 | 0.8255 | 0.7291 | 0.8255 | 0.7104 | 0.8255 | 0.7081 | 0.8255 |
95
- | 0.0306 | 1.35 | 1700 | 0.0339 | 0.8179 | 0.6816 | 0.8179 | 0.6804 | 0.8179 | 0.6770 | 0.8179 |
96
- | 0.0231 | 1.39 | 1750 | 0.0373 | 0.8179 | 0.6983 | 0.8179 | 0.6881 | 0.8179 | 0.6890 | 0.8179 |
97
- | 0.0351 | 1.43 | 1800 | 0.0356 | 0.8217 | 0.6989 | 0.8217 | 0.6917 | 0.8217 | 0.6893 | 0.8217 |
98
- | 0.0259 | 1.47 | 1850 | 0.0335 | 0.8208 | 0.6999 | 0.8208 | 0.6823 | 0.8208 | 0.6885 | 0.8208 |
99
- | 0.0371 | 1.51 | 1900 | 0.0367 | 0.8123 | 0.7412 | 0.8123 | 0.6617 | 0.8123 | 0.6817 | 0.8123 |
100
- | 0.0288 | 1.55 | 1950 | 0.0347 | 0.8179 | 0.6758 | 0.8179 | 0.6916 | 0.8179 | 0.6808 | 0.8179 |
101
- | 0.0252 | 1.59 | 2000 | 0.0357 | 0.8113 | 0.7003 | 0.8113 | 0.6714 | 0.8113 | 0.6787 | 0.8113 |
102
- | 0.0374 | 1.63 | 2050 | 0.0332 | 0.8208 | 0.7747 | 0.8208 | 0.7233 | 0.8208 | 0.7379 | 0.8208 |
103
- | 0.0356 | 1.67 | 2100 | 0.0323 | 0.8283 | 0.7425 | 0.8283 | 0.7046 | 0.8283 | 0.7162 | 0.8283 |
104
- | 0.0294 | 1.71 | 2150 | 0.0346 | 0.8113 | 0.7173 | 0.8113 | 0.7227 | 0.8113 | 0.7101 | 0.8113 |
105
- | 0.035 | 1.75 | 2200 | 0.0338 | 0.8236 | 0.7591 | 0.8236 | 0.7307 | 0.8236 | 0.7390 | 0.8236 |
106
- | 0.0432 | 1.79 | 2250 | 0.0348 | 0.8217 | 0.7694 | 0.8217 | 0.7204 | 0.8217 | 0.7295 | 0.8217 |
107
- | 0.0325 | 1.83 | 2300 | 0.0324 | 0.8330 | 0.7441 | 0.8330 | 0.7231 | 0.8330 | 0.7261 | 0.8330 |
108
- | 0.0318 | 1.87 | 2350 | 0.0321 | 0.8311 | 0.7397 | 0.8311 | 0.7241 | 0.8311 | 0.7248 | 0.8311 |
109
- | 0.0315 | 1.91 | 2400 | 0.0335 | 0.8179 | 0.6793 | 0.8179 | 0.7035 | 0.8179 | 0.6858 | 0.8179 |
110
- | 0.0331 | 1.95 | 2450 | 0.0335 | 0.8179 | 0.7295 | 0.8179 | 0.6879 | 0.8179 | 0.6956 | 0.8179 |
111
- | 0.0293 | 1.99 | 2500 | 0.0353 | 0.8057 | 0.7027 | 0.8057 | 0.6940 | 0.8057 | 0.6867 | 0.8057 |
112
 
113
 
114
  ### Framework versions
 
18
 
19
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.0357
22
+ - Precision Micro: 0.8047
23
+ - Precision Macro: 0.6995
24
+ - Recall Micro: 0.8047
25
+ - Recall Macro: 0.6609
26
+ - F1 Micro: 0.8047
27
+ - F1 Macro: 0.6661
28
+ - Accuracy: 0.8047
29
 
30
  ## Model description
31
 
 
45
 
46
  The following hyperparameters were used during training:
47
  - learning_rate: 3e-05
48
+ - train_batch_size: 8
49
  - eval_batch_size: 4
50
  - seed: 42
51
  - gradient_accumulation_steps: 4
52
+ - total_train_batch_size: 32
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
  - lr_scheduler_type: constant
55
  - lr_scheduler_warmup_ratio: 0.03
56
+ - training_steps: 725
57
 
58
  ### Training results
59
 
60
  | Training Loss | Epoch | Step | Validation Loss | Precision Micro | Precision Macro | Recall Micro | Recall Macro | F1 Micro | F1 Macro | Accuracy |
61
  |:-------------:|:-----:|:----:|:---------------:|:---------------:|:---------------:|:------------:|:------------:|:--------:|:--------:|:--------:|
62
+ | 0.0886 | 0.08 | 50 | 0.1082 | 0.5774 | 0.3988 | 0.5774 | 0.3124 | 0.5774 | 0.3222 | 0.5774 |
63
+ | 0.0572 | 0.16 | 100 | 0.0832 | 0.5877 | 0.4716 | 0.5877 | 0.3681 | 0.5877 | 0.3797 | 0.5877 |
64
+ | 0.0496 | 0.24 | 150 | 0.0525 | 0.7311 | 0.5911 | 0.7311 | 0.5747 | 0.7311 | 0.5703 | 0.7311 |
65
+ | 0.0541 | 0.32 | 200 | 0.0464 | 0.7566 | 0.6151 | 0.7566 | 0.5606 | 0.7566 | 0.5584 | 0.7566 |
66
+ | 0.0481 | 0.4 | 250 | 0.0433 | 0.7811 | 0.6636 | 0.7811 | 0.6514 | 0.7811 | 0.6369 | 0.7811 |
67
+ | 0.053 | 0.48 | 300 | 0.0452 | 0.7632 | 0.6936 | 0.7632 | 0.6461 | 0.7632 | 0.6338 | 0.7632 |
68
+ | 0.0401 | 0.56 | 350 | 0.0399 | 0.7943 | 0.7381 | 0.7943 | 0.6604 | 0.7943 | 0.6697 | 0.7943 |
69
+ | 0.0509 | 0.64 | 400 | 0.0393 | 0.8009 | 0.6546 | 0.8009 | 0.6612 | 0.8009 | 0.6501 | 0.8009 |
70
+ | 0.0474 | 0.72 | 450 | 0.0401 | 0.8019 | 0.7255 | 0.8019 | 0.6927 | 0.8019 | 0.6865 | 0.8019 |
71
+ | 0.045 | 0.79 | 500 | 0.0379 | 0.8009 | 0.7147 | 0.8009 | 0.7108 | 0.8009 | 0.6977 | 0.8009 |
72
+ | 0.0335 | 0.87 | 550 | 0.0369 | 0.8151 | 0.7046 | 0.8151 | 0.7335 | 0.8151 | 0.7135 | 0.8151 |
73
+ | 0.0429 | 0.95 | 600 | 0.0367 | 0.7962 | 0.7081 | 0.7962 | 0.6959 | 0.7962 | 0.6878 | 0.7962 |
74
+ | 0.0253 | 1.03 | 650 | 0.0342 | 0.8255 | 0.7370 | 0.8255 | 0.6975 | 0.8255 | 0.7098 | 0.8255 |
75
+ | 0.0311 | 1.11 | 700 | 0.0357 | 0.8047 | 0.6995 | 0.8047 | 0.6609 | 0.8047 | 0.6661 | 0.8047 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
 
77
 
78
  ### Framework versions
adapter_config.json CHANGED
@@ -9,24 +9,24 @@
9
  "layers_pattern": null,
10
  "layers_to_transform": null,
11
  "loftq_config": {},
12
- "lora_alpha": 16,
13
  "lora_dropout": 0.1,
14
  "megatron_config": null,
15
  "megatron_core": "megatron.core",
16
  "modules_to_save": null,
17
  "peft_type": "LORA",
18
- "r": 64,
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
22
- "k_proj",
23
- "gate_proj",
24
- "score",
25
  "v_proj",
26
  "o_proj",
27
  "q_proj",
28
- "down_proj",
29
- "up_proj"
 
 
 
30
  ],
31
  "task_type": "SEQ_CLS"
32
  }
 
9
  "layers_pattern": null,
10
  "layers_to_transform": null,
11
  "loftq_config": {},
12
+ "lora_alpha": 128.0,
13
  "lora_dropout": 0.1,
14
  "megatron_config": null,
15
  "megatron_core": "megatron.core",
16
  "modules_to_save": null,
17
  "peft_type": "LORA",
18
+ "r": 256,
19
  "rank_pattern": {},
20
  "revision": null,
21
  "target_modules": [
 
 
 
22
  "v_proj",
23
  "o_proj",
24
  "q_proj",
25
+ "score",
26
+ "up_proj",
27
+ "gate_proj",
28
+ "k_proj",
29
+ "down_proj"
30
  ],
31
  "task_type": "SEQ_CLS"
32
  }
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5a422c5174be4eb7630594a16f33d5bc3e20cdf0c0d2abdbf1a70dd0cd05a2b8
3
- size 337444704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4588a44476ad4d20a67980ab65272547e28c871b645204568d0aee1091e5f12
3
+ size 1348832888
all_results.json CHANGED
@@ -1,7 +1,18 @@
1
  {
2
- "epoch": 2.0,
3
- "train_loss": 0.05790289470982191,
4
- "train_runtime": 10296.0519,
5
- "train_samples_per_second": 3.91,
6
- "train_steps_per_second": 0.244
 
 
 
 
 
 
 
 
 
 
 
7
  }
 
1
  {
2
+ "epoch": 1.15,
3
+ "eval_accuracy": 0.8207547169811321,
4
+ "eval_f1_macro": 0.728031211883242,
5
+ "eval_f1_micro": 0.8207547169811321,
6
+ "eval_loss": 0.03309142589569092,
7
+ "eval_precision_macro": 0.7489147130312945,
8
+ "eval_precision_micro": 0.8207547169811321,
9
+ "eval_recall_macro": 0.7312128559829479,
10
+ "eval_recall_micro": 0.8207547169811321,
11
+ "eval_runtime": 66.9147,
12
+ "eval_samples_per_second": 15.841,
13
+ "eval_steps_per_second": 3.96,
14
+ "train_loss": 0.06415341473858932,
15
+ "train_runtime": 4786.626,
16
+ "train_samples_per_second": 4.847,
17
+ "train_steps_per_second": 0.151
18
  }
eval_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.0,
3
+ "eval_accuracy": 0.8207547169811321,
4
+ "eval_f1_macro": 0.728031211883242,
5
+ "eval_f1_micro": 0.8207547169811321,
6
+ "eval_loss": 0.03309142589569092,
7
+ "eval_precision_macro": 0.7489147130312945,
8
+ "eval_precision_micro": 0.8207547169811321,
9
+ "eval_recall_macro": 0.7312128559829479,
10
+ "eval_recall_micro": 0.8207547169811321,
11
+ "eval_runtime": 66.9147,
12
+ "eval_samples_per_second": 15.841,
13
+ "eval_steps_per_second": 3.96
14
+ }
metrics.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"run_name": "./output", "train_runtime": 10296.0519, "train_samples_per_second": 3.91, "train_steps_per_second": 0.244, "train_loss": 0.05790289470982191, "epoch": 2.0, "eval_loss": 0.03309142589569092, "eval_precision_micro": 0.8207547169811321, "eval_precision_macro": 0.7489147130312945, "eval_recall_micro": 0.8207547169811321, "eval_recall_macro": 0.7312128559829479, "eval_f1_micro": 0.8207547169811321, "eval_f1_macro": 0.728031211883242, "eval_accuracy": 0.8207547169811321, "eval_runtime": 66.9147, "eval_samples_per_second": 15.841, "eval_steps_per_second": 3.96}
train_results.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
- "epoch": 2.0,
3
- "train_loss": 0.05790289470982191,
4
- "train_runtime": 10296.0519,
5
- "train_samples_per_second": 3.91,
6
- "train_steps_per_second": 0.244
7
  }
 
1
  {
2
+ "epoch": 1.15,
3
+ "train_loss": 0.06415341473858932,
4
+ "train_runtime": 4786.626,
5
+ "train_samples_per_second": 4.847,
6
+ "train_steps_per_second": 0.151
7
  }
trainer_state.json CHANGED
@@ -1,2286 +1,672 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 1.9996026226902444,
5
  "eval_steps": 50,
6
- "global_step": 2516,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.01,
13
  "learning_rate": 3e-05,
14
- "loss": 2.1025,
15
  "step": 10
16
  },
17
  {
18
- "epoch": 0.02,
19
  "learning_rate": 3e-05,
20
- "loss": 0.2678,
21
  "step": 20
22
  },
23
  {
24
- "epoch": 0.02,
25
  "learning_rate": 3e-05,
26
- "loss": 0.1686,
27
  "step": 30
28
  },
29
  {
30
- "epoch": 0.03,
31
  "learning_rate": 3e-05,
32
- "loss": 0.1283,
33
  "step": 40
34
  },
35
  {
36
- "epoch": 0.04,
37
  "learning_rate": 3e-05,
38
- "loss": 0.1114,
39
  "step": 50
40
  },
41
  {
42
- "epoch": 0.04,
43
- "eval_accuracy": 0.3613207547169811,
44
- "eval_f1_macro": 0.09860460155366327,
45
- "eval_f1_micro": 0.3613207547169811,
46
- "eval_loss": 0.2315492480993271,
47
- "eval_precision_macro": 0.12357052455449527,
48
- "eval_precision_micro": 0.3613207547169811,
49
- "eval_recall_macro": 0.11941290624193263,
50
- "eval_recall_micro": 0.3613207547169811,
51
- "eval_runtime": 66.9606,
52
- "eval_samples_per_second": 15.83,
53
- "eval_steps_per_second": 3.958,
54
  "step": 50
55
  },
56
  {
57
- "epoch": 0.05,
58
  "learning_rate": 3e-05,
59
- "loss": 0.6159,
60
  "step": 60
61
  },
62
  {
63
- "epoch": 0.06,
64
  "learning_rate": 3e-05,
65
- "loss": 0.1207,
66
  "step": 70
67
  },
68
  {
69
- "epoch": 0.06,
70
  "learning_rate": 3e-05,
71
- "loss": 0.0914,
72
  "step": 80
73
  },
74
  {
75
- "epoch": 0.07,
76
  "learning_rate": 3e-05,
77
- "loss": 0.0971,
78
  "step": 90
79
  },
80
  {
81
- "epoch": 0.08,
82
  "learning_rate": 3e-05,
83
- "loss": 0.1009,
84
  "step": 100
85
  },
86
  {
87
- "epoch": 0.08,
88
- "eval_accuracy": 0.4867924528301887,
89
- "eval_f1_macro": 0.2574441564495774,
90
- "eval_f1_micro": 0.4867924528301887,
91
- "eval_loss": 0.1300116926431656,
92
- "eval_precision_macro": 0.26018914773023644,
93
- "eval_precision_micro": 0.4867924528301887,
94
- "eval_recall_macro": 0.2927378841554505,
95
- "eval_recall_micro": 0.4867924528301887,
96
- "eval_runtime": 66.8969,
97
- "eval_samples_per_second": 15.845,
98
- "eval_steps_per_second": 3.961,
99
  "step": 100
100
  },
101
  {
102
- "epoch": 0.09,
103
  "learning_rate": 3e-05,
104
- "loss": 0.2813,
105
  "step": 110
106
  },
107
  {
108
- "epoch": 0.1,
109
  "learning_rate": 3e-05,
110
- "loss": 0.0721,
111
  "step": 120
112
  },
113
  {
114
- "epoch": 0.1,
115
  "learning_rate": 3e-05,
116
- "loss": 0.0789,
117
  "step": 130
118
  },
119
  {
120
- "epoch": 0.11,
121
  "learning_rate": 3e-05,
122
- "loss": 0.0559,
123
  "step": 140
124
  },
125
  {
126
- "epoch": 0.12,
127
  "learning_rate": 3e-05,
128
- "loss": 0.0655,
129
  "step": 150
130
  },
131
  {
132
- "epoch": 0.12,
133
- "eval_accuracy": 0.5820754716981132,
134
- "eval_f1_macro": 0.3406909384301982,
135
- "eval_f1_micro": 0.5820754716981132,
136
- "eval_loss": 0.11111029237508774,
137
- "eval_precision_macro": 0.4592469495612598,
138
- "eval_precision_micro": 0.5820754716981132,
139
- "eval_recall_macro": 0.32597718546465326,
140
- "eval_recall_micro": 0.5820754716981132,
141
- "eval_runtime": 67.3369,
142
- "eval_samples_per_second": 15.742,
143
- "eval_steps_per_second": 3.935,
144
  "step": 150
145
  },
146
  {
147
- "epoch": 0.13,
148
  "learning_rate": 3e-05,
149
- "loss": 0.2815,
150
  "step": 160
151
  },
152
  {
153
- "epoch": 0.14,
154
  "learning_rate": 3e-05,
155
- "loss": 0.0738,
156
  "step": 170
157
  },
158
  {
159
- "epoch": 0.14,
160
  "learning_rate": 3e-05,
161
- "loss": 0.06,
162
  "step": 180
163
  },
164
  {
165
- "epoch": 0.15,
166
  "learning_rate": 3e-05,
167
- "loss": 0.064,
168
  "step": 190
169
  },
170
  {
171
- "epoch": 0.16,
172
  "learning_rate": 3e-05,
173
- "loss": 0.0675,
174
  "step": 200
175
  },
176
  {
177
- "epoch": 0.16,
178
- "eval_accuracy": 0.6103773584905661,
179
- "eval_f1_macro": 0.39941812670769666,
180
- "eval_f1_micro": 0.6103773584905661,
181
- "eval_loss": 0.09801042824983597,
182
- "eval_precision_macro": 0.4309243547594424,
183
- "eval_precision_micro": 0.6103773584905661,
184
- "eval_recall_macro": 0.4115863977855726,
185
- "eval_recall_micro": 0.6103773584905661,
186
- "eval_runtime": 66.8379,
187
- "eval_samples_per_second": 15.859,
188
- "eval_steps_per_second": 3.965,
189
  "step": 200
190
  },
191
  {
192
- "epoch": 0.17,
193
  "learning_rate": 3e-05,
194
- "loss": 0.2238,
195
  "step": 210
196
  },
197
  {
198
- "epoch": 0.17,
199
  "learning_rate": 3e-05,
200
- "loss": 0.0665,
201
  "step": 220
202
  },
203
  {
204
- "epoch": 0.18,
205
  "learning_rate": 3e-05,
206
- "loss": 0.0526,
207
  "step": 230
208
  },
209
  {
210
- "epoch": 0.19,
211
  "learning_rate": 3e-05,
212
- "loss": 0.0622,
213
  "step": 240
214
  },
215
  {
216
- "epoch": 0.2,
217
  "learning_rate": 3e-05,
218
- "loss": 0.0613,
219
  "step": 250
220
  },
221
  {
222
- "epoch": 0.2,
223
- "eval_accuracy": 0.6349056603773585,
224
- "eval_f1_macro": 0.43281084183779084,
225
- "eval_f1_micro": 0.6349056603773585,
226
- "eval_loss": 0.0867743045091629,
227
- "eval_precision_macro": 0.5027159556787124,
228
- "eval_precision_micro": 0.6349056603773585,
229
- "eval_recall_macro": 0.42379042156244773,
230
- "eval_recall_micro": 0.6349056603773585,
231
- "eval_runtime": 66.7898,
232
- "eval_samples_per_second": 15.871,
233
- "eval_steps_per_second": 3.968,
234
  "step": 250
235
  },
236
  {
237
- "epoch": 0.21,
238
  "learning_rate": 3e-05,
239
- "loss": 0.211,
240
  "step": 260
241
  },
242
  {
243
- "epoch": 0.21,
244
  "learning_rate": 3e-05,
245
- "loss": 0.0642,
246
  "step": 270
247
  },
248
  {
249
- "epoch": 0.22,
250
  "learning_rate": 3e-05,
251
- "loss": 0.0571,
252
  "step": 280
253
  },
254
  {
255
- "epoch": 0.23,
256
  "learning_rate": 3e-05,
257
- "loss": 0.0595,
258
  "step": 290
259
  },
260
  {
261
- "epoch": 0.24,
262
  "learning_rate": 3e-05,
263
- "loss": 0.0423,
264
  "step": 300
265
  },
266
  {
267
- "epoch": 0.24,
268
- "eval_accuracy": 0.6405660377358491,
269
- "eval_f1_macro": 0.48377541137861874,
270
- "eval_f1_micro": 0.6405660377358491,
271
- "eval_loss": 0.08292412012815475,
272
- "eval_precision_macro": 0.4971153033333408,
273
- "eval_precision_micro": 0.6405660377358491,
274
- "eval_recall_macro": 0.5150079548728711,
275
- "eval_recall_micro": 0.6405660377358491,
276
- "eval_runtime": 66.8647,
277
- "eval_samples_per_second": 15.853,
278
- "eval_steps_per_second": 3.963,
279
  "step": 300
280
  },
281
  {
282
- "epoch": 0.25,
283
  "learning_rate": 3e-05,
284
- "loss": 0.1501,
285
  "step": 310
286
  },
287
  {
288
- "epoch": 0.25,
289
  "learning_rate": 3e-05,
290
- "loss": 0.0583,
291
  "step": 320
292
  },
293
  {
294
- "epoch": 0.26,
295
  "learning_rate": 3e-05,
296
- "loss": 0.0406,
297
  "step": 330
298
  },
299
  {
300
- "epoch": 0.27,
301
  "learning_rate": 3e-05,
302
- "loss": 0.0512,
303
  "step": 340
304
  },
305
  {
306
- "epoch": 0.28,
307
  "learning_rate": 3e-05,
308
- "loss": 0.0495,
309
  "step": 350
310
  },
311
  {
312
- "epoch": 0.28,
313
- "eval_accuracy": 0.6839622641509434,
314
- "eval_f1_macro": 0.5117892165590059,
315
- "eval_f1_micro": 0.6839622641509434,
316
- "eval_loss": 0.06472181528806686,
317
- "eval_precision_macro": 0.5620645130448471,
318
- "eval_precision_micro": 0.6839622641509434,
319
- "eval_recall_macro": 0.5110243944495119,
320
- "eval_recall_micro": 0.6839622641509434,
321
- "eval_runtime": 66.951,
322
- "eval_samples_per_second": 15.832,
323
- "eval_steps_per_second": 3.958,
324
  "step": 350
325
  },
326
  {
327
- "epoch": 0.29,
328
  "learning_rate": 3e-05,
329
- "loss": 0.126,
330
  "step": 360
331
  },
332
  {
333
- "epoch": 0.29,
334
  "learning_rate": 3e-05,
335
- "loss": 0.0659,
336
  "step": 370
337
  },
338
  {
339
- "epoch": 0.3,
340
  "learning_rate": 3e-05,
341
- "loss": 0.0521,
342
  "step": 380
343
  },
344
  {
345
- "epoch": 0.31,
346
  "learning_rate": 3e-05,
347
- "loss": 0.0558,
348
  "step": 390
349
  },
350
  {
351
- "epoch": 0.32,
352
  "learning_rate": 3e-05,
353
- "loss": 0.0696,
354
  "step": 400
355
  },
356
  {
357
- "epoch": 0.32,
358
- "eval_accuracy": 0.7235849056603774,
359
- "eval_f1_macro": 0.5523304099808589,
360
- "eval_f1_micro": 0.7235849056603774,
361
- "eval_loss": 0.05833260715007782,
362
- "eval_precision_macro": 0.5853706474287289,
363
- "eval_precision_micro": 0.7235849056603774,
364
- "eval_recall_macro": 0.5476004753387752,
365
- "eval_recall_micro": 0.7235849056603774,
366
- "eval_runtime": 67.3456,
367
- "eval_samples_per_second": 15.74,
368
- "eval_steps_per_second": 3.935,
369
  "step": 400
370
  },
371
  {
372
- "epoch": 0.33,
373
  "learning_rate": 3e-05,
374
- "loss": 0.0861,
375
  "step": 410
376
  },
377
  {
378
- "epoch": 0.33,
379
  "learning_rate": 3e-05,
380
- "loss": 0.0457,
381
  "step": 420
382
  },
383
  {
384
- "epoch": 0.34,
385
  "learning_rate": 3e-05,
386
- "loss": 0.0549,
387
  "step": 430
388
  },
389
  {
390
- "epoch": 0.35,
391
  "learning_rate": 3e-05,
392
- "loss": 0.0505,
393
  "step": 440
394
  },
395
  {
396
- "epoch": 0.36,
397
  "learning_rate": 3e-05,
398
- "loss": 0.0551,
399
  "step": 450
400
  },
401
  {
402
- "epoch": 0.36,
403
- "eval_accuracy": 0.7537735849056604,
404
- "eval_f1_macro": 0.580144082849279,
405
- "eval_f1_micro": 0.7537735849056603,
406
- "eval_loss": 0.04702736809849739,
407
- "eval_precision_macro": 0.6037061602706483,
408
- "eval_precision_micro": 0.7537735849056604,
409
- "eval_recall_macro": 0.5804359588026832,
410
- "eval_recall_micro": 0.7537735849056604,
411
- "eval_runtime": 66.8194,
412
- "eval_samples_per_second": 15.864,
413
- "eval_steps_per_second": 3.966,
414
  "step": 450
415
  },
416
  {
417
- "epoch": 0.37,
418
  "learning_rate": 3e-05,
419
- "loss": 0.0701,
420
  "step": 460
421
  },
422
  {
423
- "epoch": 0.37,
424
  "learning_rate": 3e-05,
425
- "loss": 0.0483,
426
  "step": 470
427
  },
428
  {
429
- "epoch": 0.38,
430
  "learning_rate": 3e-05,
431
- "loss": 0.0427,
432
  "step": 480
433
  },
434
  {
435
- "epoch": 0.39,
436
  "learning_rate": 3e-05,
437
- "loss": 0.0437,
438
  "step": 490
439
  },
440
  {
441
- "epoch": 0.4,
442
  "learning_rate": 3e-05,
443
- "loss": 0.0485,
444
  "step": 500
445
  },
446
  {
447
- "epoch": 0.4,
448
- "eval_accuracy": 0.7632075471698113,
449
- "eval_f1_macro": 0.5976025328641371,
450
- "eval_f1_micro": 0.7632075471698113,
451
- "eval_loss": 0.046745266765356064,
452
- "eval_precision_macro": 0.6244027962032025,
453
- "eval_precision_micro": 0.7632075471698113,
454
- "eval_recall_macro": 0.6092701629048938,
455
- "eval_recall_micro": 0.7632075471698113,
456
- "eval_runtime": 66.8345,
457
- "eval_samples_per_second": 15.86,
458
- "eval_steps_per_second": 3.965,
459
  "step": 500
460
  },
461
  {
462
- "epoch": 0.41,
463
  "learning_rate": 3e-05,
464
- "loss": 0.0676,
465
  "step": 510
466
  },
467
  {
468
- "epoch": 0.41,
469
  "learning_rate": 3e-05,
470
- "loss": 0.0424,
471
  "step": 520
472
  },
473
  {
474
- "epoch": 0.42,
475
  "learning_rate": 3e-05,
476
- "loss": 0.0533,
477
  "step": 530
478
  },
479
  {
480
- "epoch": 0.43,
481
  "learning_rate": 3e-05,
482
- "loss": 0.0405,
483
  "step": 540
484
  },
485
  {
486
- "epoch": 0.44,
487
  "learning_rate": 3e-05,
488
- "loss": 0.0514,
489
  "step": 550
490
  },
491
  {
492
- "epoch": 0.44,
493
- "eval_accuracy": 0.7452830188679245,
494
- "eval_f1_macro": 0.6148723643378806,
495
- "eval_f1_micro": 0.7452830188679244,
496
- "eval_loss": 0.0491117425262928,
497
- "eval_precision_macro": 0.6623956744108057,
498
- "eval_precision_micro": 0.7452830188679245,
499
- "eval_recall_macro": 0.6055074363524768,
500
- "eval_recall_micro": 0.7452830188679245,
501
- "eval_runtime": 66.8014,
502
- "eval_samples_per_second": 15.868,
503
- "eval_steps_per_second": 3.967,
504
  "step": 550
505
  },
506
  {
507
- "epoch": 0.45,
508
  "learning_rate": 3e-05,
509
- "loss": 0.0657,
510
  "step": 560
511
  },
512
  {
513
- "epoch": 0.45,
514
  "learning_rate": 3e-05,
515
- "loss": 0.0491,
516
  "step": 570
517
  },
518
  {
519
- "epoch": 0.46,
520
  "learning_rate": 3e-05,
521
- "loss": 0.0415,
522
  "step": 580
523
  },
524
  {
525
- "epoch": 0.47,
526
  "learning_rate": 3e-05,
527
- "loss": 0.0485,
528
  "step": 590
529
  },
530
  {
531
- "epoch": 0.48,
532
  "learning_rate": 3e-05,
533
- "loss": 0.0537,
534
  "step": 600
535
  },
536
  {
537
- "epoch": 0.48,
538
- "eval_accuracy": 0.7547169811320755,
539
- "eval_f1_macro": 0.5955912007854481,
540
- "eval_f1_micro": 0.7547169811320754,
541
- "eval_loss": 0.04687512293457985,
542
- "eval_precision_macro": 0.6564521374886103,
543
- "eval_precision_micro": 0.7547169811320755,
544
- "eval_recall_macro": 0.6140077817767989,
545
- "eval_recall_micro": 0.7547169811320755,
546
- "eval_runtime": 67.0823,
547
- "eval_samples_per_second": 15.801,
548
- "eval_steps_per_second": 3.95,
549
  "step": 600
550
  },
551
  {
552
- "epoch": 0.48,
553
  "learning_rate": 3e-05,
554
- "loss": 0.0494,
555
  "step": 610
556
  },
557
  {
558
- "epoch": 0.49,
559
  "learning_rate": 3e-05,
560
- "loss": 0.0472,
561
  "step": 620
562
  },
563
  {
564
- "epoch": 0.5,
565
  "learning_rate": 3e-05,
566
- "loss": 0.045,
567
  "step": 630
568
  },
569
  {
570
- "epoch": 0.51,
571
  "learning_rate": 3e-05,
572
- "loss": 0.0447,
573
  "step": 640
574
  },
575
  {
576
- "epoch": 0.52,
577
  "learning_rate": 3e-05,
578
- "loss": 0.0503,
579
  "step": 650
580
  },
581
  {
582
- "epoch": 0.52,
583
- "eval_accuracy": 0.7433962264150943,
584
- "eval_f1_macro": 0.5711369889957133,
585
- "eval_f1_micro": 0.7433962264150943,
586
- "eval_loss": 0.04730157181620598,
587
- "eval_precision_macro": 0.6365229300442473,
588
- "eval_precision_micro": 0.7433962264150943,
589
- "eval_recall_macro": 0.5711131524489298,
590
- "eval_recall_micro": 0.7433962264150943,
591
- "eval_runtime": 66.9933,
592
- "eval_samples_per_second": 15.822,
593
- "eval_steps_per_second": 3.956,
594
  "step": 650
595
  },
596
  {
597
- "epoch": 0.52,
598
  "learning_rate": 3e-05,
599
- "loss": 0.0632,
600
  "step": 660
601
  },
602
  {
603
- "epoch": 0.53,
604
  "learning_rate": 3e-05,
605
- "loss": 0.0525,
606
  "step": 670
607
  },
608
  {
609
- "epoch": 0.54,
610
  "learning_rate": 3e-05,
611
- "loss": 0.0369,
612
  "step": 680
613
  },
614
  {
615
- "epoch": 0.55,
616
  "learning_rate": 3e-05,
617
- "loss": 0.0392,
618
  "step": 690
619
  },
620
  {
621
- "epoch": 0.56,
622
  "learning_rate": 3e-05,
623
- "loss": 0.0502,
624
  "step": 700
625
  },
626
  {
627
- "epoch": 0.56,
628
- "eval_accuracy": 0.7990566037735849,
629
- "eval_f1_macro": 0.6486516495348912,
630
- "eval_f1_micro": 0.799056603773585,
631
- "eval_loss": 0.04286834970116615,
632
- "eval_precision_macro": 0.6674675949162269,
633
- "eval_precision_micro": 0.7990566037735849,
634
- "eval_recall_macro": 0.6430318401752134,
635
- "eval_recall_micro": 0.7990566037735849,
636
- "eval_runtime": 66.7816,
637
- "eval_samples_per_second": 15.873,
638
- "eval_steps_per_second": 3.968,
639
  "step": 700
640
  },
641
  {
642
- "epoch": 0.56,
643
  "learning_rate": 3e-05,
644
- "loss": 0.0562,
645
  "step": 710
646
  },
647
  {
648
- "epoch": 0.57,
649
  "learning_rate": 3e-05,
650
- "loss": 0.0417,
651
  "step": 720
652
  },
653
  {
654
- "epoch": 0.58,
655
- "learning_rate": 3e-05,
656
- "loss": 0.0384,
657
- "step": 730
658
- },
659
- {
660
- "epoch": 0.59,
661
- "learning_rate": 3e-05,
662
- "loss": 0.0386,
663
- "step": 740
664
- },
665
- {
666
- "epoch": 0.6,
667
- "learning_rate": 3e-05,
668
- "loss": 0.0568,
669
- "step": 750
670
- },
671
- {
672
- "epoch": 0.6,
673
- "eval_accuracy": 0.7830188679245284,
674
- "eval_f1_macro": 0.6035355055785452,
675
- "eval_f1_micro": 0.7830188679245284,
676
- "eval_loss": 0.04214347526431084,
677
- "eval_precision_macro": 0.6399716318087022,
678
- "eval_precision_micro": 0.7830188679245284,
679
- "eval_recall_macro": 0.6197061339803496,
680
- "eval_recall_micro": 0.7830188679245284,
681
- "eval_runtime": 66.9036,
682
- "eval_samples_per_second": 15.844,
683
- "eval_steps_per_second": 3.961,
684
- "step": 750
685
- },
686
- {
687
- "epoch": 0.6,
688
- "learning_rate": 3e-05,
689
- "loss": 0.0524,
690
- "step": 760
691
- },
692
- {
693
- "epoch": 0.61,
694
- "learning_rate": 3e-05,
695
- "loss": 0.0403,
696
- "step": 770
697
- },
698
- {
699
- "epoch": 0.62,
700
- "learning_rate": 3e-05,
701
- "loss": 0.0346,
702
- "step": 780
703
- },
704
- {
705
- "epoch": 0.63,
706
- "learning_rate": 3e-05,
707
- "loss": 0.0436,
708
- "step": 790
709
- },
710
- {
711
- "epoch": 0.64,
712
- "learning_rate": 3e-05,
713
- "loss": 0.0456,
714
- "step": 800
715
- },
716
- {
717
- "epoch": 0.64,
718
- "eval_accuracy": 0.8037735849056604,
719
- "eval_f1_macro": 0.6795100443217031,
720
- "eval_f1_micro": 0.8037735849056604,
721
- "eval_loss": 0.03851619362831116,
722
- "eval_precision_macro": 0.6660272950062351,
723
- "eval_precision_micro": 0.8037735849056604,
724
- "eval_recall_macro": 0.7100461955802515,
725
- "eval_recall_micro": 0.8037735849056604,
726
- "eval_runtime": 66.7565,
727
- "eval_samples_per_second": 15.879,
728
- "eval_steps_per_second": 3.97,
729
- "step": 800
730
- },
731
- {
732
- "epoch": 0.64,
733
- "learning_rate": 3e-05,
734
- "loss": 0.0404,
735
- "step": 810
736
- },
737
- {
738
- "epoch": 0.65,
739
- "learning_rate": 3e-05,
740
- "loss": 0.0415,
741
- "step": 820
742
- },
743
- {
744
- "epoch": 0.66,
745
- "learning_rate": 3e-05,
746
- "loss": 0.034,
747
- "step": 830
748
- },
749
- {
750
- "epoch": 0.67,
751
- "learning_rate": 3e-05,
752
- "loss": 0.0465,
753
- "step": 840
754
- },
755
- {
756
- "epoch": 0.68,
757
- "learning_rate": 3e-05,
758
- "loss": 0.0465,
759
- "step": 850
760
- },
761
- {
762
- "epoch": 0.68,
763
- "eval_accuracy": 0.7867924528301887,
764
- "eval_f1_macro": 0.6637790925123535,
765
- "eval_f1_micro": 0.7867924528301887,
766
- "eval_loss": 0.04226187616586685,
767
- "eval_precision_macro": 0.70799384577877,
768
- "eval_precision_micro": 0.7867924528301887,
769
- "eval_recall_macro": 0.6535685213398926,
770
- "eval_recall_micro": 0.7867924528301887,
771
- "eval_runtime": 66.8456,
772
- "eval_samples_per_second": 15.857,
773
- "eval_steps_per_second": 3.964,
774
- "step": 850
775
- },
776
- {
777
- "epoch": 0.68,
778
- "learning_rate": 3e-05,
779
- "loss": 0.0428,
780
- "step": 860
781
- },
782
- {
783
- "epoch": 0.69,
784
- "learning_rate": 3e-05,
785
- "loss": 0.0455,
786
- "step": 870
787
- },
788
- {
789
- "epoch": 0.7,
790
- "learning_rate": 3e-05,
791
- "loss": 0.0467,
792
- "step": 880
793
- },
794
- {
795
- "epoch": 0.71,
796
- "learning_rate": 3e-05,
797
- "loss": 0.0381,
798
- "step": 890
799
- },
800
- {
801
- "epoch": 0.72,
802
- "learning_rate": 3e-05,
803
- "loss": 0.0517,
804
- "step": 900
805
- },
806
- {
807
- "epoch": 0.72,
808
- "eval_accuracy": 0.7830188679245284,
809
- "eval_f1_macro": 0.604409711538349,
810
- "eval_f1_micro": 0.7830188679245284,
811
- "eval_loss": 0.04051998630166054,
812
- "eval_precision_macro": 0.6482245905845607,
813
- "eval_precision_micro": 0.7830188679245284,
814
- "eval_recall_macro": 0.5953272937433359,
815
- "eval_recall_micro": 0.7830188679245284,
816
- "eval_runtime": 66.9721,
817
- "eval_samples_per_second": 15.827,
818
- "eval_steps_per_second": 3.957,
819
- "step": 900
820
- },
821
- {
822
- "epoch": 0.72,
823
- "learning_rate": 3e-05,
824
- "loss": 0.0406,
825
- "step": 910
826
- },
827
- {
828
- "epoch": 0.73,
829
- "learning_rate": 3e-05,
830
- "loss": 0.037,
831
- "step": 920
832
- },
833
- {
834
- "epoch": 0.74,
835
- "learning_rate": 3e-05,
836
- "loss": 0.0445,
837
- "step": 930
838
- },
839
- {
840
- "epoch": 0.75,
841
- "learning_rate": 3e-05,
842
- "loss": 0.0359,
843
- "step": 940
844
- },
845
- {
846
- "epoch": 0.76,
847
- "learning_rate": 3e-05,
848
- "loss": 0.0449,
849
- "step": 950
850
- },
851
- {
852
- "epoch": 0.76,
853
- "eval_accuracy": 0.7962264150943397,
854
- "eval_f1_macro": 0.6595487480161657,
855
- "eval_f1_micro": 0.7962264150943396,
856
- "eval_loss": 0.03951858729124069,
857
- "eval_precision_macro": 0.678313535245044,
858
- "eval_precision_micro": 0.7962264150943397,
859
- "eval_recall_macro": 0.6782248779232171,
860
- "eval_recall_micro": 0.7962264150943397,
861
- "eval_runtime": 67.2586,
862
- "eval_samples_per_second": 15.76,
863
- "eval_steps_per_second": 3.94,
864
- "step": 950
865
- },
866
- {
867
- "epoch": 0.76,
868
- "learning_rate": 3e-05,
869
- "loss": 0.0473,
870
- "step": 960
871
- },
872
- {
873
- "epoch": 0.77,
874
- "learning_rate": 3e-05,
875
- "loss": 0.0387,
876
- "step": 970
877
- },
878
- {
879
- "epoch": 0.78,
880
- "learning_rate": 3e-05,
881
- "loss": 0.0393,
882
- "step": 980
883
- },
884
- {
885
- "epoch": 0.79,
886
- "learning_rate": 3e-05,
887
- "loss": 0.0344,
888
- "step": 990
889
- },
890
- {
891
- "epoch": 0.79,
892
- "learning_rate": 3e-05,
893
- "loss": 0.0438,
894
- "step": 1000
895
- },
896
- {
897
- "epoch": 0.79,
898
- "eval_accuracy": 0.7650943396226415,
899
- "eval_f1_macro": 0.6269730408930883,
900
- "eval_f1_micro": 0.7650943396226415,
901
- "eval_loss": 0.041479434818029404,
902
- "eval_precision_macro": 0.6310264924491513,
903
- "eval_precision_micro": 0.7650943396226415,
904
- "eval_recall_macro": 0.651893356526834,
905
- "eval_recall_micro": 0.7650943396226415,
906
- "eval_runtime": 66.8963,
907
- "eval_samples_per_second": 15.845,
908
- "eval_steps_per_second": 3.961,
909
- "step": 1000
910
- },
911
- {
912
- "epoch": 0.8,
913
- "learning_rate": 3e-05,
914
- "loss": 0.0454,
915
- "step": 1010
916
- },
917
- {
918
- "epoch": 0.81,
919
- "learning_rate": 3e-05,
920
- "loss": 0.0389,
921
- "step": 1020
922
- },
923
- {
924
- "epoch": 0.82,
925
- "learning_rate": 3e-05,
926
- "loss": 0.0385,
927
- "step": 1030
928
- },
929
- {
930
- "epoch": 0.83,
931
- "learning_rate": 3e-05,
932
- "loss": 0.0465,
933
- "step": 1040
934
- },
935
- {
936
- "epoch": 0.83,
937
- "learning_rate": 3e-05,
938
- "loss": 0.0368,
939
- "step": 1050
940
- },
941
- {
942
- "epoch": 0.83,
943
- "eval_accuracy": 0.8141509433962264,
944
- "eval_f1_macro": 0.6884941209926929,
945
- "eval_f1_micro": 0.8141509433962264,
946
- "eval_loss": 0.036739904433488846,
947
- "eval_precision_macro": 0.7076595104531683,
948
- "eval_precision_micro": 0.8141509433962264,
949
- "eval_recall_macro": 0.6998335623662951,
950
- "eval_recall_micro": 0.8141509433962264,
951
- "eval_runtime": 66.8429,
952
- "eval_samples_per_second": 15.858,
953
- "eval_steps_per_second": 3.965,
954
- "step": 1050
955
- },
956
- {
957
- "epoch": 0.84,
958
- "learning_rate": 3e-05,
959
- "loss": 0.0315,
960
- "step": 1060
961
- },
962
- {
963
- "epoch": 0.85,
964
- "learning_rate": 3e-05,
965
- "loss": 0.048,
966
- "step": 1070
967
- },
968
- {
969
- "epoch": 0.86,
970
- "learning_rate": 3e-05,
971
- "loss": 0.0423,
972
- "step": 1080
973
- },
974
- {
975
- "epoch": 0.87,
976
- "learning_rate": 3e-05,
977
- "loss": 0.0399,
978
- "step": 1090
979
- },
980
- {
981
- "epoch": 0.87,
982
- "learning_rate": 3e-05,
983
- "loss": 0.0351,
984
- "step": 1100
985
- },
986
- {
987
- "epoch": 0.87,
988
- "eval_accuracy": 0.8150943396226416,
989
- "eval_f1_macro": 0.6795761385716744,
990
- "eval_f1_micro": 0.8150943396226416,
991
- "eval_loss": 0.03497824817895889,
992
- "eval_precision_macro": 0.6863775670636837,
993
- "eval_precision_micro": 0.8150943396226416,
994
- "eval_recall_macro": 0.6837727133564548,
995
- "eval_recall_micro": 0.8150943396226416,
996
- "eval_runtime": 66.7682,
997
- "eval_samples_per_second": 15.876,
998
- "eval_steps_per_second": 3.969,
999
- "step": 1100
1000
- },
1001
- {
1002
- "epoch": 0.88,
1003
- "learning_rate": 3e-05,
1004
- "loss": 0.0356,
1005
- "step": 1110
1006
- },
1007
- {
1008
- "epoch": 0.89,
1009
- "learning_rate": 3e-05,
1010
- "loss": 0.034,
1011
- "step": 1120
1012
- },
1013
- {
1014
- "epoch": 0.9,
1015
- "learning_rate": 3e-05,
1016
- "loss": 0.0379,
1017
- "step": 1130
1018
- },
1019
- {
1020
- "epoch": 0.91,
1021
- "learning_rate": 3e-05,
1022
- "loss": 0.0354,
1023
- "step": 1140
1024
- },
1025
- {
1026
- "epoch": 0.91,
1027
- "learning_rate": 3e-05,
1028
- "loss": 0.042,
1029
- "step": 1150
1030
- },
1031
- {
1032
- "epoch": 0.91,
1033
- "eval_accuracy": 0.8066037735849056,
1034
- "eval_f1_macro": 0.662741723846436,
1035
- "eval_f1_micro": 0.8066037735849056,
1036
- "eval_loss": 0.036217570304870605,
1037
- "eval_precision_macro": 0.6895018843592504,
1038
- "eval_precision_micro": 0.8066037735849056,
1039
- "eval_recall_macro": 0.6592689442585865,
1040
- "eval_recall_micro": 0.8066037735849056,
1041
- "eval_runtime": 66.8597,
1042
- "eval_samples_per_second": 15.854,
1043
- "eval_steps_per_second": 3.964,
1044
- "step": 1150
1045
- },
1046
- {
1047
- "epoch": 0.92,
1048
- "learning_rate": 3e-05,
1049
- "loss": 0.0405,
1050
- "step": 1160
1051
- },
1052
- {
1053
- "epoch": 0.93,
1054
- "learning_rate": 3e-05,
1055
- "loss": 0.0408,
1056
- "step": 1170
1057
- },
1058
- {
1059
- "epoch": 0.94,
1060
- "learning_rate": 3e-05,
1061
- "loss": 0.0522,
1062
- "step": 1180
1063
- },
1064
- {
1065
- "epoch": 0.95,
1066
- "learning_rate": 3e-05,
1067
- "loss": 0.0356,
1068
- "step": 1190
1069
- },
1070
- {
1071
- "epoch": 0.95,
1072
- "learning_rate": 3e-05,
1073
- "loss": 0.0449,
1074
- "step": 1200
1075
- },
1076
- {
1077
- "epoch": 0.95,
1078
- "eval_accuracy": 0.7924528301886793,
1079
- "eval_f1_macro": 0.6582734622403671,
1080
- "eval_f1_micro": 0.7924528301886793,
1081
- "eval_loss": 0.036735132336616516,
1082
- "eval_precision_macro": 0.6685428560679947,
1083
- "eval_precision_micro": 0.7924528301886793,
1084
- "eval_recall_macro": 0.6671460190032963,
1085
- "eval_recall_micro": 0.7924528301886793,
1086
- "eval_runtime": 66.7753,
1087
- "eval_samples_per_second": 15.874,
1088
- "eval_steps_per_second": 3.969,
1089
- "step": 1200
1090
- },
1091
- {
1092
- "epoch": 0.96,
1093
- "learning_rate": 3e-05,
1094
- "loss": 0.0422,
1095
- "step": 1210
1096
- },
1097
- {
1098
- "epoch": 0.97,
1099
- "learning_rate": 3e-05,
1100
- "loss": 0.0469,
1101
- "step": 1220
1102
- },
1103
- {
1104
- "epoch": 0.98,
1105
- "learning_rate": 3e-05,
1106
- "loss": 0.0403,
1107
- "step": 1230
1108
- },
1109
- {
1110
- "epoch": 0.99,
1111
- "learning_rate": 3e-05,
1112
- "loss": 0.0401,
1113
- "step": 1240
1114
- },
1115
- {
1116
- "epoch": 0.99,
1117
- "learning_rate": 3e-05,
1118
- "loss": 0.0331,
1119
- "step": 1250
1120
- },
1121
- {
1122
- "epoch": 0.99,
1123
- "eval_accuracy": 0.8018867924528302,
1124
- "eval_f1_macro": 0.6660554002763479,
1125
- "eval_f1_micro": 0.8018867924528302,
1126
- "eval_loss": 0.038156915456056595,
1127
- "eval_precision_macro": 0.6760235498659594,
1128
- "eval_precision_micro": 0.8018867924528302,
1129
- "eval_recall_macro": 0.6847602869615839,
1130
- "eval_recall_micro": 0.8018867924528302,
1131
- "eval_runtime": 66.972,
1132
- "eval_samples_per_second": 15.828,
1133
- "eval_steps_per_second": 3.957,
1134
- "step": 1250
1135
- },
1136
- {
1137
- "epoch": 1.0,
1138
- "learning_rate": 3e-05,
1139
- "loss": 0.0403,
1140
- "step": 1260
1141
- },
1142
- {
1143
- "epoch": 1.01,
1144
- "learning_rate": 3e-05,
1145
- "loss": 0.0391,
1146
- "step": 1270
1147
- },
1148
- {
1149
- "epoch": 1.02,
1150
- "learning_rate": 3e-05,
1151
- "loss": 0.0315,
1152
- "step": 1280
1153
- },
1154
- {
1155
- "epoch": 1.03,
1156
- "learning_rate": 3e-05,
1157
- "loss": 0.0334,
1158
- "step": 1290
1159
- },
1160
- {
1161
- "epoch": 1.03,
1162
- "learning_rate": 3e-05,
1163
- "loss": 0.0367,
1164
- "step": 1300
1165
- },
1166
- {
1167
- "epoch": 1.03,
1168
- "eval_accuracy": 0.8037735849056604,
1169
- "eval_f1_macro": 0.6590298558707023,
1170
- "eval_f1_micro": 0.8037735849056604,
1171
- "eval_loss": 0.037248801440000534,
1172
- "eval_precision_macro": 0.711878576411288,
1173
- "eval_precision_micro": 0.8037735849056604,
1174
- "eval_recall_macro": 0.6500565322393169,
1175
- "eval_recall_micro": 0.8037735849056604,
1176
- "eval_runtime": 66.9977,
1177
- "eval_samples_per_second": 15.821,
1178
- "eval_steps_per_second": 3.955,
1179
- "step": 1300
1180
- },
1181
- {
1182
- "epoch": 1.04,
1183
- "learning_rate": 3e-05,
1184
- "loss": 0.0323,
1185
- "step": 1310
1186
- },
1187
- {
1188
- "epoch": 1.05,
1189
- "learning_rate": 3e-05,
1190
- "loss": 0.0283,
1191
- "step": 1320
1192
- },
1193
- {
1194
- "epoch": 1.06,
1195
- "learning_rate": 3e-05,
1196
- "loss": 0.0317,
1197
- "step": 1330
1198
- },
1199
- {
1200
- "epoch": 1.06,
1201
- "learning_rate": 3e-05,
1202
- "loss": 0.0368,
1203
- "step": 1340
1204
- },
1205
- {
1206
- "epoch": 1.07,
1207
- "learning_rate": 3e-05,
1208
- "loss": 0.0357,
1209
- "step": 1350
1210
- },
1211
- {
1212
- "epoch": 1.07,
1213
- "eval_accuracy": 0.7990566037735849,
1214
- "eval_f1_macro": 0.6639410239226647,
1215
- "eval_f1_micro": 0.799056603773585,
1216
- "eval_loss": 0.03749080002307892,
1217
- "eval_precision_macro": 0.68220871249212,
1218
- "eval_precision_micro": 0.7990566037735849,
1219
- "eval_recall_macro": 0.6657052159769387,
1220
- "eval_recall_micro": 0.7990566037735849,
1221
- "eval_runtime": 67.3114,
1222
- "eval_samples_per_second": 15.748,
1223
- "eval_steps_per_second": 3.937,
1224
- "step": 1350
1225
- },
1226
- {
1227
- "epoch": 1.08,
1228
- "learning_rate": 3e-05,
1229
- "loss": 0.0411,
1230
- "step": 1360
1231
- },
1232
- {
1233
- "epoch": 1.09,
1234
- "learning_rate": 3e-05,
1235
- "loss": 0.035,
1236
- "step": 1370
1237
- },
1238
- {
1239
- "epoch": 1.1,
1240
- "learning_rate": 3e-05,
1241
- "loss": 0.0365,
1242
- "step": 1380
1243
- },
1244
- {
1245
- "epoch": 1.1,
1246
- "learning_rate": 3e-05,
1247
- "loss": 0.0321,
1248
- "step": 1390
1249
- },
1250
- {
1251
- "epoch": 1.11,
1252
- "learning_rate": 3e-05,
1253
- "loss": 0.0405,
1254
- "step": 1400
1255
- },
1256
- {
1257
- "epoch": 1.11,
1258
- "eval_accuracy": 0.810377358490566,
1259
- "eval_f1_macro": 0.6823173521717408,
1260
- "eval_f1_micro": 0.8103773584905661,
1261
- "eval_loss": 0.03539792075753212,
1262
- "eval_precision_macro": 0.6735195406597105,
1263
- "eval_precision_micro": 0.810377358490566,
1264
- "eval_recall_macro": 0.7010849626749771,
1265
- "eval_recall_micro": 0.810377358490566,
1266
- "eval_runtime": 66.9573,
1267
- "eval_samples_per_second": 15.831,
1268
- "eval_steps_per_second": 3.958,
1269
- "step": 1400
1270
- },
1271
- {
1272
- "epoch": 1.12,
1273
- "learning_rate": 3e-05,
1274
- "loss": 0.0403,
1275
- "step": 1410
1276
- },
1277
- {
1278
- "epoch": 1.13,
1279
- "learning_rate": 3e-05,
1280
- "loss": 0.0355,
1281
- "step": 1420
1282
- },
1283
- {
1284
- "epoch": 1.14,
1285
- "learning_rate": 3e-05,
1286
- "loss": 0.0262,
1287
- "step": 1430
1288
- },
1289
- {
1290
- "epoch": 1.14,
1291
- "learning_rate": 3e-05,
1292
- "loss": 0.0314,
1293
- "step": 1440
1294
- },
1295
- {
1296
- "epoch": 1.15,
1297
- "learning_rate": 3e-05,
1298
- "loss": 0.0281,
1299
- "step": 1450
1300
- },
1301
- {
1302
- "epoch": 1.15,
1303
- "eval_accuracy": 0.8301886792452831,
1304
- "eval_f1_macro": 0.6936579743869716,
1305
- "eval_f1_micro": 0.8301886792452831,
1306
- "eval_loss": 0.03378523513674736,
1307
- "eval_precision_macro": 0.6880699408956382,
1308
- "eval_precision_micro": 0.8301886792452831,
1309
- "eval_recall_macro": 0.7081810763903263,
1310
- "eval_recall_micro": 0.8301886792452831,
1311
- "eval_runtime": 66.8615,
1312
- "eval_samples_per_second": 15.854,
1313
- "eval_steps_per_second": 3.963,
1314
- "step": 1450
1315
- },
1316
- {
1317
- "epoch": 1.16,
1318
- "learning_rate": 3e-05,
1319
- "loss": 0.0426,
1320
- "step": 1460
1321
- },
1322
- {
1323
- "epoch": 1.17,
1324
- "learning_rate": 3e-05,
1325
- "loss": 0.0331,
1326
- "step": 1470
1327
- },
1328
- {
1329
- "epoch": 1.18,
1330
- "learning_rate": 3e-05,
1331
- "loss": 0.0274,
1332
- "step": 1480
1333
- },
1334
- {
1335
- "epoch": 1.18,
1336
- "learning_rate": 3e-05,
1337
- "loss": 0.0303,
1338
- "step": 1490
1339
- },
1340
- {
1341
- "epoch": 1.19,
1342
- "learning_rate": 3e-05,
1343
- "loss": 0.0362,
1344
- "step": 1500
1345
- },
1346
- {
1347
- "epoch": 1.19,
1348
- "eval_accuracy": 0.8122641509433962,
1349
- "eval_f1_macro": 0.6607417290714642,
1350
- "eval_f1_micro": 0.8122641509433962,
1351
- "eval_loss": 0.0350893959403038,
1352
- "eval_precision_macro": 0.7043834982343933,
1353
- "eval_precision_micro": 0.8122641509433962,
1354
- "eval_recall_macro": 0.6559410812932247,
1355
- "eval_recall_micro": 0.8122641509433962,
1356
- "eval_runtime": 67.0277,
1357
- "eval_samples_per_second": 15.814,
1358
- "eval_steps_per_second": 3.954,
1359
- "step": 1500
1360
- },
1361
- {
1362
- "epoch": 1.2,
1363
- "learning_rate": 3e-05,
1364
- "loss": 0.0359,
1365
- "step": 1510
1366
- },
1367
- {
1368
- "epoch": 1.21,
1369
- "learning_rate": 3e-05,
1370
- "loss": 0.0223,
1371
- "step": 1520
1372
- },
1373
- {
1374
- "epoch": 1.22,
1375
- "learning_rate": 3e-05,
1376
- "loss": 0.0284,
1377
- "step": 1530
1378
- },
1379
- {
1380
- "epoch": 1.22,
1381
- "learning_rate": 3e-05,
1382
- "loss": 0.0445,
1383
- "step": 1540
1384
- },
1385
- {
1386
- "epoch": 1.23,
1387
- "learning_rate": 3e-05,
1388
- "loss": 0.0214,
1389
- "step": 1550
1390
- },
1391
- {
1392
- "epoch": 1.23,
1393
- "eval_accuracy": 0.810377358490566,
1394
- "eval_f1_macro": 0.6778793723330956,
1395
- "eval_f1_micro": 0.8103773584905661,
1396
- "eval_loss": 0.035039015114307404,
1397
- "eval_precision_macro": 0.7081161930503027,
1398
- "eval_precision_micro": 0.810377358490566,
1399
- "eval_recall_macro": 0.6748998812700701,
1400
- "eval_recall_micro": 0.810377358490566,
1401
- "eval_runtime": 67.0902,
1402
- "eval_samples_per_second": 15.8,
1403
- "eval_steps_per_second": 3.95,
1404
- "step": 1550
1405
- },
1406
- {
1407
- "epoch": 1.24,
1408
- "learning_rate": 3e-05,
1409
- "loss": 0.0396,
1410
- "step": 1560
1411
- },
1412
- {
1413
- "epoch": 1.25,
1414
- "learning_rate": 3e-05,
1415
- "loss": 0.0421,
1416
- "step": 1570
1417
- },
1418
- {
1419
- "epoch": 1.26,
1420
- "learning_rate": 3e-05,
1421
- "loss": 0.0367,
1422
- "step": 1580
1423
- },
1424
- {
1425
- "epoch": 1.26,
1426
- "learning_rate": 3e-05,
1427
- "loss": 0.029,
1428
- "step": 1590
1429
- },
1430
- {
1431
- "epoch": 1.27,
1432
- "learning_rate": 3e-05,
1433
- "loss": 0.0321,
1434
- "step": 1600
1435
- },
1436
- {
1437
- "epoch": 1.27,
1438
- "eval_accuracy": 0.809433962264151,
1439
- "eval_f1_macro": 0.7277842533202593,
1440
- "eval_f1_micro": 0.809433962264151,
1441
- "eval_loss": 0.036841992288827896,
1442
- "eval_precision_macro": 0.754059995164892,
1443
- "eval_precision_micro": 0.809433962264151,
1444
- "eval_recall_macro": 0.7253784421960152,
1445
- "eval_recall_micro": 0.809433962264151,
1446
- "eval_runtime": 67.1117,
1447
- "eval_samples_per_second": 15.795,
1448
- "eval_steps_per_second": 3.949,
1449
- "step": 1600
1450
- },
1451
- {
1452
- "epoch": 1.28,
1453
- "learning_rate": 3e-05,
1454
- "loss": 0.0338,
1455
- "step": 1610
1456
- },
1457
- {
1458
- "epoch": 1.29,
1459
- "learning_rate": 3e-05,
1460
- "loss": 0.0361,
1461
- "step": 1620
1462
- },
1463
- {
1464
- "epoch": 1.3,
1465
- "learning_rate": 3e-05,
1466
- "loss": 0.0415,
1467
- "step": 1630
1468
- },
1469
- {
1470
- "epoch": 1.3,
1471
- "learning_rate": 3e-05,
1472
- "loss": 0.0354,
1473
- "step": 1640
1474
- },
1475
- {
1476
- "epoch": 1.31,
1477
- "learning_rate": 3e-05,
1478
- "loss": 0.0332,
1479
- "step": 1650
1480
- },
1481
- {
1482
- "epoch": 1.31,
1483
- "eval_accuracy": 0.8254716981132075,
1484
- "eval_f1_macro": 0.7081292929169892,
1485
- "eval_f1_micro": 0.8254716981132075,
1486
- "eval_loss": 0.03387230262160301,
1487
- "eval_precision_macro": 0.7291239674093415,
1488
- "eval_precision_micro": 0.8254716981132075,
1489
- "eval_recall_macro": 0.7104202884103088,
1490
- "eval_recall_micro": 0.8254716981132075,
1491
- "eval_runtime": 67.285,
1492
- "eval_samples_per_second": 15.754,
1493
- "eval_steps_per_second": 3.938,
1494
- "step": 1650
1495
- },
1496
- {
1497
- "epoch": 1.32,
1498
- "learning_rate": 3e-05,
1499
- "loss": 0.0337,
1500
- "step": 1660
1501
- },
1502
- {
1503
- "epoch": 1.33,
1504
- "learning_rate": 3e-05,
1505
- "loss": 0.0281,
1506
- "step": 1670
1507
- },
1508
- {
1509
- "epoch": 1.34,
1510
- "learning_rate": 3e-05,
1511
- "loss": 0.0269,
1512
- "step": 1680
1513
- },
1514
- {
1515
- "epoch": 1.34,
1516
- "learning_rate": 3e-05,
1517
- "loss": 0.0339,
1518
- "step": 1690
1519
- },
1520
- {
1521
- "epoch": 1.35,
1522
- "learning_rate": 3e-05,
1523
- "loss": 0.0306,
1524
- "step": 1700
1525
- },
1526
- {
1527
- "epoch": 1.35,
1528
- "eval_accuracy": 0.8179245283018868,
1529
- "eval_f1_macro": 0.6769788054372391,
1530
- "eval_f1_micro": 0.8179245283018868,
1531
- "eval_loss": 0.03388019651174545,
1532
- "eval_precision_macro": 0.6816133549156956,
1533
- "eval_precision_micro": 0.8179245283018868,
1534
- "eval_recall_macro": 0.680429225227406,
1535
- "eval_recall_micro": 0.8179245283018868,
1536
- "eval_runtime": 67.4515,
1537
- "eval_samples_per_second": 15.715,
1538
- "eval_steps_per_second": 3.929,
1539
- "step": 1700
1540
- },
1541
- {
1542
- "epoch": 1.36,
1543
- "learning_rate": 3e-05,
1544
- "loss": 0.0376,
1545
- "step": 1710
1546
- },
1547
- {
1548
- "epoch": 1.37,
1549
- "learning_rate": 3e-05,
1550
- "loss": 0.0243,
1551
- "step": 1720
1552
- },
1553
- {
1554
- "epoch": 1.37,
1555
- "learning_rate": 3e-05,
1556
- "loss": 0.0302,
1557
- "step": 1730
1558
- },
1559
- {
1560
- "epoch": 1.38,
1561
- "learning_rate": 3e-05,
1562
- "loss": 0.0334,
1563
- "step": 1740
1564
- },
1565
- {
1566
- "epoch": 1.39,
1567
- "learning_rate": 3e-05,
1568
- "loss": 0.0231,
1569
- "step": 1750
1570
- },
1571
- {
1572
- "epoch": 1.39,
1573
- "eval_accuracy": 0.8179245283018868,
1574
- "eval_f1_macro": 0.6890240801949487,
1575
- "eval_f1_micro": 0.8179245283018868,
1576
- "eval_loss": 0.03725350275635719,
1577
- "eval_precision_macro": 0.6983358697605533,
1578
- "eval_precision_micro": 0.8179245283018868,
1579
- "eval_recall_macro": 0.6881012857058126,
1580
- "eval_recall_micro": 0.8179245283018868,
1581
- "eval_runtime": 67.1945,
1582
- "eval_samples_per_second": 15.775,
1583
- "eval_steps_per_second": 3.944,
1584
- "step": 1750
1585
- },
1586
- {
1587
- "epoch": 1.4,
1588
- "learning_rate": 3e-05,
1589
- "loss": 0.0351,
1590
- "step": 1760
1591
- },
1592
- {
1593
- "epoch": 1.41,
1594
- "learning_rate": 3e-05,
1595
- "loss": 0.0312,
1596
- "step": 1770
1597
- },
1598
- {
1599
- "epoch": 1.41,
1600
- "learning_rate": 3e-05,
1601
- "loss": 0.036,
1602
- "step": 1780
1603
- },
1604
- {
1605
- "epoch": 1.42,
1606
- "learning_rate": 3e-05,
1607
- "loss": 0.0336,
1608
- "step": 1790
1609
- },
1610
- {
1611
- "epoch": 1.43,
1612
- "learning_rate": 3e-05,
1613
- "loss": 0.0351,
1614
- "step": 1800
1615
- },
1616
- {
1617
- "epoch": 1.43,
1618
- "eval_accuracy": 0.8216981132075472,
1619
- "eval_f1_macro": 0.6893274603000062,
1620
- "eval_f1_micro": 0.821698113207547,
1621
- "eval_loss": 0.035641398280858994,
1622
- "eval_precision_macro": 0.6989494989333128,
1623
- "eval_precision_micro": 0.8216981132075472,
1624
- "eval_recall_macro": 0.6917495141935277,
1625
- "eval_recall_micro": 0.8216981132075472,
1626
- "eval_runtime": 72.4563,
1627
- "eval_samples_per_second": 14.629,
1628
- "eval_steps_per_second": 3.657,
1629
- "step": 1800
1630
- },
1631
- {
1632
- "epoch": 1.44,
1633
- "learning_rate": 3e-05,
1634
- "loss": 0.0315,
1635
- "step": 1810
1636
- },
1637
- {
1638
- "epoch": 1.45,
1639
- "learning_rate": 3e-05,
1640
- "loss": 0.0378,
1641
- "step": 1820
1642
- },
1643
- {
1644
- "epoch": 1.45,
1645
- "learning_rate": 3e-05,
1646
- "loss": 0.0297,
1647
- "step": 1830
1648
- },
1649
- {
1650
- "epoch": 1.46,
1651
- "learning_rate": 3e-05,
1652
- "loss": 0.0405,
1653
- "step": 1840
1654
- },
1655
- {
1656
- "epoch": 1.47,
1657
- "learning_rate": 3e-05,
1658
- "loss": 0.0259,
1659
- "step": 1850
1660
- },
1661
- {
1662
- "epoch": 1.47,
1663
- "eval_accuracy": 0.8207547169811321,
1664
- "eval_f1_macro": 0.6884764059910556,
1665
- "eval_f1_micro": 0.8207547169811321,
1666
- "eval_loss": 0.033535219728946686,
1667
- "eval_precision_macro": 0.6999273971863751,
1668
- "eval_precision_micro": 0.8207547169811321,
1669
- "eval_recall_macro": 0.6823064809142775,
1670
- "eval_recall_micro": 0.8207547169811321,
1671
- "eval_runtime": 75.3135,
1672
- "eval_samples_per_second": 14.074,
1673
- "eval_steps_per_second": 3.519,
1674
- "step": 1850
1675
- },
1676
- {
1677
- "epoch": 1.48,
1678
- "learning_rate": 3e-05,
1679
- "loss": 0.0313,
1680
- "step": 1860
1681
- },
1682
- {
1683
- "epoch": 1.49,
1684
- "learning_rate": 3e-05,
1685
- "loss": 0.0411,
1686
- "step": 1870
1687
- },
1688
- {
1689
- "epoch": 1.49,
1690
- "learning_rate": 3e-05,
1691
- "loss": 0.0294,
1692
- "step": 1880
1693
- },
1694
- {
1695
- "epoch": 1.5,
1696
- "learning_rate": 3e-05,
1697
- "loss": 0.0357,
1698
- "step": 1890
1699
- },
1700
- {
1701
- "epoch": 1.51,
1702
- "learning_rate": 3e-05,
1703
- "loss": 0.0371,
1704
- "step": 1900
1705
- },
1706
- {
1707
- "epoch": 1.51,
1708
- "eval_accuracy": 0.8122641509433962,
1709
- "eval_f1_macro": 0.6817031059683786,
1710
- "eval_f1_micro": 0.8122641509433962,
1711
- "eval_loss": 0.03668028488755226,
1712
- "eval_precision_macro": 0.7411726936728738,
1713
- "eval_precision_micro": 0.8122641509433962,
1714
- "eval_recall_macro": 0.6617258448443556,
1715
- "eval_recall_micro": 0.8122641509433962,
1716
- "eval_runtime": 68.4416,
1717
- "eval_samples_per_second": 15.488,
1718
- "eval_steps_per_second": 3.872,
1719
- "step": 1900
1720
- },
1721
- {
1722
- "epoch": 1.52,
1723
- "learning_rate": 3e-05,
1724
- "loss": 0.0414,
1725
- "step": 1910
1726
- },
1727
- {
1728
- "epoch": 1.53,
1729
- "learning_rate": 3e-05,
1730
- "loss": 0.0359,
1731
- "step": 1920
1732
- },
1733
- {
1734
- "epoch": 1.53,
1735
- "learning_rate": 3e-05,
1736
- "loss": 0.0364,
1737
- "step": 1930
1738
- },
1739
- {
1740
- "epoch": 1.54,
1741
- "learning_rate": 3e-05,
1742
- "loss": 0.0328,
1743
- "step": 1940
1744
- },
1745
- {
1746
- "epoch": 1.55,
1747
- "learning_rate": 3e-05,
1748
- "loss": 0.0288,
1749
- "step": 1950
1750
- },
1751
- {
1752
- "epoch": 1.55,
1753
- "eval_accuracy": 0.8179245283018868,
1754
- "eval_f1_macro": 0.6808261949093507,
1755
- "eval_f1_micro": 0.8179245283018868,
1756
- "eval_loss": 0.03465178981423378,
1757
- "eval_precision_macro": 0.6758330579285278,
1758
- "eval_precision_micro": 0.8179245283018868,
1759
- "eval_recall_macro": 0.6916444013212283,
1760
- "eval_recall_micro": 0.8179245283018868,
1761
- "eval_runtime": 67.4794,
1762
- "eval_samples_per_second": 15.708,
1763
- "eval_steps_per_second": 3.927,
1764
- "step": 1950
1765
- },
1766
- {
1767
- "epoch": 1.56,
1768
- "learning_rate": 3e-05,
1769
- "loss": 0.0292,
1770
- "step": 1960
1771
- },
1772
- {
1773
- "epoch": 1.57,
1774
- "learning_rate": 3e-05,
1775
- "loss": 0.0372,
1776
- "step": 1970
1777
- },
1778
- {
1779
- "epoch": 1.57,
1780
- "learning_rate": 3e-05,
1781
- "loss": 0.0371,
1782
- "step": 1980
1783
- },
1784
- {
1785
- "epoch": 1.58,
1786
- "learning_rate": 3e-05,
1787
- "loss": 0.0292,
1788
- "step": 1990
1789
- },
1790
- {
1791
- "epoch": 1.59,
1792
- "learning_rate": 3e-05,
1793
- "loss": 0.0252,
1794
- "step": 2000
1795
- },
1796
- {
1797
- "epoch": 1.59,
1798
- "eval_accuracy": 0.8113207547169812,
1799
- "eval_f1_macro": 0.6786603793711855,
1800
- "eval_f1_micro": 0.8113207547169812,
1801
- "eval_loss": 0.03572586923837662,
1802
- "eval_precision_macro": 0.7003439014078875,
1803
- "eval_precision_micro": 0.8113207547169812,
1804
- "eval_recall_macro": 0.6714151723581485,
1805
- "eval_recall_micro": 0.8113207547169812,
1806
- "eval_runtime": 67.3677,
1807
- "eval_samples_per_second": 15.735,
1808
- "eval_steps_per_second": 3.934,
1809
- "step": 2000
1810
- },
1811
- {
1812
- "epoch": 1.6,
1813
- "learning_rate": 3e-05,
1814
- "loss": 0.0418,
1815
- "step": 2010
1816
- },
1817
- {
1818
- "epoch": 1.61,
1819
- "learning_rate": 3e-05,
1820
- "loss": 0.0306,
1821
- "step": 2020
1822
- },
1823
- {
1824
- "epoch": 1.61,
1825
- "learning_rate": 3e-05,
1826
- "loss": 0.0264,
1827
- "step": 2030
1828
- },
1829
- {
1830
- "epoch": 1.62,
1831
- "learning_rate": 3e-05,
1832
- "loss": 0.0352,
1833
- "step": 2040
1834
- },
1835
- {
1836
- "epoch": 1.63,
1837
- "learning_rate": 3e-05,
1838
- "loss": 0.0374,
1839
- "step": 2050
1840
- },
1841
- {
1842
- "epoch": 1.63,
1843
- "eval_accuracy": 0.8207547169811321,
1844
- "eval_f1_macro": 0.7378651093912652,
1845
- "eval_f1_micro": 0.8207547169811321,
1846
- "eval_loss": 0.03318563476204872,
1847
- "eval_precision_macro": 0.7746611477051377,
1848
- "eval_precision_micro": 0.8207547169811321,
1849
- "eval_recall_macro": 0.7232885741364632,
1850
- "eval_recall_micro": 0.8207547169811321,
1851
- "eval_runtime": 67.1675,
1852
- "eval_samples_per_second": 15.781,
1853
- "eval_steps_per_second": 3.945,
1854
- "step": 2050
1855
- },
1856
- {
1857
- "epoch": 1.64,
1858
- "learning_rate": 3e-05,
1859
- "loss": 0.0334,
1860
- "step": 2060
1861
- },
1862
- {
1863
- "epoch": 1.65,
1864
- "learning_rate": 3e-05,
1865
- "loss": 0.0275,
1866
- "step": 2070
1867
- },
1868
- {
1869
- "epoch": 1.65,
1870
- "learning_rate": 3e-05,
1871
- "loss": 0.0367,
1872
- "step": 2080
1873
- },
1874
- {
1875
- "epoch": 1.66,
1876
- "learning_rate": 3e-05,
1877
- "loss": 0.0347,
1878
- "step": 2090
1879
- },
1880
- {
1881
- "epoch": 1.67,
1882
- "learning_rate": 3e-05,
1883
- "loss": 0.0356,
1884
- "step": 2100
1885
- },
1886
- {
1887
- "epoch": 1.67,
1888
- "eval_accuracy": 0.8283018867924529,
1889
- "eval_f1_macro": 0.7162407407283602,
1890
- "eval_f1_micro": 0.8283018867924529,
1891
- "eval_loss": 0.032257240265607834,
1892
- "eval_precision_macro": 0.7425264980116305,
1893
- "eval_precision_micro": 0.8283018867924529,
1894
- "eval_recall_macro": 0.7045621292629789,
1895
- "eval_recall_micro": 0.8283018867924529,
1896
- "eval_runtime": 67.05,
1897
- "eval_samples_per_second": 15.809,
1898
- "eval_steps_per_second": 3.952,
1899
- "step": 2100
1900
- },
1901
- {
1902
- "epoch": 1.68,
1903
- "learning_rate": 3e-05,
1904
- "loss": 0.0345,
1905
- "step": 2110
1906
- },
1907
- {
1908
- "epoch": 1.68,
1909
- "learning_rate": 3e-05,
1910
- "loss": 0.0324,
1911
- "step": 2120
1912
- },
1913
- {
1914
- "epoch": 1.69,
1915
- "learning_rate": 3e-05,
1916
- "loss": 0.0317,
1917
- "step": 2130
1918
- },
1919
- {
1920
- "epoch": 1.7,
1921
- "learning_rate": 3e-05,
1922
- "loss": 0.0372,
1923
- "step": 2140
1924
- },
1925
- {
1926
- "epoch": 1.71,
1927
- "learning_rate": 3e-05,
1928
- "loss": 0.0294,
1929
- "step": 2150
1930
- },
1931
- {
1932
- "epoch": 1.71,
1933
- "eval_accuracy": 0.8113207547169812,
1934
- "eval_f1_macro": 0.7100925220691637,
1935
- "eval_f1_micro": 0.8113207547169812,
1936
- "eval_loss": 0.03457261621952057,
1937
- "eval_precision_macro": 0.7173368388422002,
1938
- "eval_precision_micro": 0.8113207547169812,
1939
- "eval_recall_macro": 0.722749933086757,
1940
- "eval_recall_micro": 0.8113207547169812,
1941
- "eval_runtime": 66.9989,
1942
- "eval_samples_per_second": 15.821,
1943
- "eval_steps_per_second": 3.955,
1944
- "step": 2150
1945
- },
1946
- {
1947
- "epoch": 1.72,
1948
- "learning_rate": 3e-05,
1949
- "loss": 0.0322,
1950
- "step": 2160
1951
- },
1952
- {
1953
- "epoch": 1.72,
1954
- "learning_rate": 3e-05,
1955
- "loss": 0.038,
1956
- "step": 2170
1957
- },
1958
- {
1959
- "epoch": 1.73,
1960
- "learning_rate": 3e-05,
1961
- "loss": 0.0283,
1962
- "step": 2180
1963
- },
1964
- {
1965
- "epoch": 1.74,
1966
- "learning_rate": 3e-05,
1967
- "loss": 0.0346,
1968
- "step": 2190
1969
- },
1970
- {
1971
- "epoch": 1.75,
1972
- "learning_rate": 3e-05,
1973
- "loss": 0.035,
1974
- "step": 2200
1975
- },
1976
- {
1977
- "epoch": 1.75,
1978
- "eval_accuracy": 0.8235849056603773,
1979
- "eval_f1_macro": 0.7390128397138074,
1980
- "eval_f1_micro": 0.8235849056603773,
1981
- "eval_loss": 0.033848535269498825,
1982
- "eval_precision_macro": 0.7590826949473053,
1983
- "eval_precision_micro": 0.8235849056603773,
1984
- "eval_recall_macro": 0.730688195944597,
1985
- "eval_recall_micro": 0.8235849056603773,
1986
- "eval_runtime": 66.9638,
1987
- "eval_samples_per_second": 15.829,
1988
- "eval_steps_per_second": 3.957,
1989
- "step": 2200
1990
- },
1991
- {
1992
- "epoch": 1.76,
1993
- "learning_rate": 3e-05,
1994
- "loss": 0.0347,
1995
- "step": 2210
1996
- },
1997
- {
1998
- "epoch": 1.76,
1999
- "learning_rate": 3e-05,
2000
- "loss": 0.0252,
2001
- "step": 2220
2002
- },
2003
- {
2004
- "epoch": 1.77,
2005
- "learning_rate": 3e-05,
2006
- "loss": 0.037,
2007
- "step": 2230
2008
- },
2009
- {
2010
- "epoch": 1.78,
2011
- "learning_rate": 3e-05,
2012
- "loss": 0.0352,
2013
- "step": 2240
2014
- },
2015
- {
2016
- "epoch": 1.79,
2017
- "learning_rate": 3e-05,
2018
- "loss": 0.0432,
2019
- "step": 2250
2020
- },
2021
- {
2022
- "epoch": 1.79,
2023
- "eval_accuracy": 0.8216981132075472,
2024
- "eval_f1_macro": 0.7295141356598547,
2025
- "eval_f1_micro": 0.821698113207547,
2026
- "eval_loss": 0.03482788801193237,
2027
- "eval_precision_macro": 0.7693704211435056,
2028
- "eval_precision_micro": 0.8216981132075472,
2029
- "eval_recall_macro": 0.7204303826474574,
2030
- "eval_recall_micro": 0.8216981132075472,
2031
- "eval_runtime": 67.0744,
2032
- "eval_samples_per_second": 15.803,
2033
- "eval_steps_per_second": 3.951,
2034
- "step": 2250
2035
- },
2036
- {
2037
- "epoch": 1.8,
2038
- "learning_rate": 3e-05,
2039
- "loss": 0.0313,
2040
- "step": 2260
2041
- },
2042
- {
2043
- "epoch": 1.8,
2044
- "learning_rate": 3e-05,
2045
- "loss": 0.0367,
2046
- "step": 2270
2047
- },
2048
- {
2049
- "epoch": 1.81,
2050
- "learning_rate": 3e-05,
2051
- "loss": 0.0294,
2052
- "step": 2280
2053
- },
2054
- {
2055
- "epoch": 1.82,
2056
- "learning_rate": 3e-05,
2057
- "loss": 0.0265,
2058
- "step": 2290
2059
- },
2060
- {
2061
- "epoch": 1.83,
2062
- "learning_rate": 3e-05,
2063
- "loss": 0.0325,
2064
- "step": 2300
2065
- },
2066
- {
2067
- "epoch": 1.83,
2068
- "eval_accuracy": 0.8330188679245283,
2069
- "eval_f1_macro": 0.7260646503551377,
2070
- "eval_f1_micro": 0.8330188679245283,
2071
- "eval_loss": 0.032365720719099045,
2072
- "eval_precision_macro": 0.7440576765333733,
2073
- "eval_precision_micro": 0.8330188679245283,
2074
- "eval_recall_macro": 0.7231434220015308,
2075
- "eval_recall_micro": 0.8330188679245283,
2076
- "eval_runtime": 67.0867,
2077
- "eval_samples_per_second": 15.8,
2078
- "eval_steps_per_second": 3.95,
2079
- "step": 2300
2080
- },
2081
- {
2082
- "epoch": 1.84,
2083
- "learning_rate": 3e-05,
2084
- "loss": 0.0361,
2085
- "step": 2310
2086
- },
2087
- {
2088
- "epoch": 1.84,
2089
- "learning_rate": 3e-05,
2090
- "loss": 0.029,
2091
- "step": 2320
2092
- },
2093
- {
2094
- "epoch": 1.85,
2095
- "learning_rate": 3e-05,
2096
- "loss": 0.0325,
2097
- "step": 2330
2098
- },
2099
- {
2100
- "epoch": 1.86,
2101
- "learning_rate": 3e-05,
2102
- "loss": 0.0266,
2103
- "step": 2340
2104
- },
2105
- {
2106
- "epoch": 1.87,
2107
- "learning_rate": 3e-05,
2108
- "loss": 0.0318,
2109
- "step": 2350
2110
- },
2111
- {
2112
- "epoch": 1.87,
2113
- "eval_accuracy": 0.8311320754716981,
2114
- "eval_f1_macro": 0.7248036031015876,
2115
- "eval_f1_micro": 0.8311320754716981,
2116
- "eval_loss": 0.03213372081518173,
2117
- "eval_precision_macro": 0.7397395837984007,
2118
- "eval_precision_micro": 0.8311320754716981,
2119
- "eval_recall_macro": 0.7241410864722072,
2120
- "eval_recall_micro": 0.8311320754716981,
2121
- "eval_runtime": 67.1828,
2122
- "eval_samples_per_second": 15.778,
2123
- "eval_steps_per_second": 3.944,
2124
- "step": 2350
2125
- },
2126
- {
2127
- "epoch": 1.88,
2128
- "learning_rate": 3e-05,
2129
- "loss": 0.0339,
2130
- "step": 2360
2131
- },
2132
- {
2133
- "epoch": 1.88,
2134
- "learning_rate": 3e-05,
2135
- "loss": 0.0359,
2136
- "step": 2370
2137
- },
2138
- {
2139
- "epoch": 1.89,
2140
- "learning_rate": 3e-05,
2141
- "loss": 0.0296,
2142
- "step": 2380
2143
- },
2144
- {
2145
- "epoch": 1.9,
2146
- "learning_rate": 3e-05,
2147
- "loss": 0.0249,
2148
- "step": 2390
2149
- },
2150
- {
2151
- "epoch": 1.91,
2152
- "learning_rate": 3e-05,
2153
- "loss": 0.0315,
2154
- "step": 2400
2155
- },
2156
- {
2157
- "epoch": 1.91,
2158
- "eval_accuracy": 0.8179245283018868,
2159
- "eval_f1_macro": 0.6858375088253653,
2160
- "eval_f1_micro": 0.8179245283018868,
2161
- "eval_loss": 0.033517900854349136,
2162
- "eval_precision_macro": 0.6792945547363913,
2163
- "eval_precision_micro": 0.8179245283018868,
2164
- "eval_recall_macro": 0.7034801209007658,
2165
- "eval_recall_micro": 0.8179245283018868,
2166
- "eval_runtime": 67.2538,
2167
- "eval_samples_per_second": 15.761,
2168
- "eval_steps_per_second": 3.94,
2169
- "step": 2400
2170
- },
2171
- {
2172
- "epoch": 1.92,
2173
- "learning_rate": 3e-05,
2174
- "loss": 0.037,
2175
- "step": 2410
2176
- },
2177
- {
2178
- "epoch": 1.92,
2179
- "learning_rate": 3e-05,
2180
- "loss": 0.032,
2181
- "step": 2420
2182
- },
2183
- {
2184
- "epoch": 1.93,
2185
- "learning_rate": 3e-05,
2186
- "loss": 0.0333,
2187
- "step": 2430
2188
- },
2189
- {
2190
- "epoch": 1.94,
2191
- "learning_rate": 3e-05,
2192
- "loss": 0.0369,
2193
- "step": 2440
2194
- },
2195
- {
2196
- "epoch": 1.95,
2197
- "learning_rate": 3e-05,
2198
- "loss": 0.0331,
2199
- "step": 2450
2200
- },
2201
- {
2202
- "epoch": 1.95,
2203
- "eval_accuracy": 0.8179245283018868,
2204
- "eval_f1_macro": 0.6955540611871792,
2205
- "eval_f1_micro": 0.8179245283018868,
2206
- "eval_loss": 0.033520761877298355,
2207
- "eval_precision_macro": 0.7294988206190055,
2208
- "eval_precision_micro": 0.8179245283018868,
2209
- "eval_recall_macro": 0.6879491415746545,
2210
- "eval_recall_micro": 0.8179245283018868,
2211
- "eval_runtime": 67.0948,
2212
- "eval_samples_per_second": 15.799,
2213
- "eval_steps_per_second": 3.95,
2214
- "step": 2450
2215
- },
2216
- {
2217
- "epoch": 1.96,
2218
- "learning_rate": 3e-05,
2219
- "loss": 0.035,
2220
- "step": 2460
2221
- },
2222
- {
2223
- "epoch": 1.96,
2224
- "learning_rate": 3e-05,
2225
- "loss": 0.0323,
2226
- "step": 2470
2227
- },
2228
- {
2229
- "epoch": 1.97,
2230
- "learning_rate": 3e-05,
2231
- "loss": 0.0346,
2232
- "step": 2480
2233
- },
2234
- {
2235
- "epoch": 1.98,
2236
- "learning_rate": 3e-05,
2237
- "loss": 0.0287,
2238
- "step": 2490
2239
- },
2240
- {
2241
- "epoch": 1.99,
2242
- "learning_rate": 3e-05,
2243
- "loss": 0.0293,
2244
- "step": 2500
2245
- },
2246
- {
2247
- "epoch": 1.99,
2248
- "eval_accuracy": 0.8056603773584906,
2249
- "eval_f1_macro": 0.6866675132658472,
2250
- "eval_f1_micro": 0.8056603773584906,
2251
- "eval_loss": 0.03530614450573921,
2252
- "eval_precision_macro": 0.7026639102515493,
2253
- "eval_precision_micro": 0.8056603773584906,
2254
- "eval_recall_macro": 0.6939733521884667,
2255
- "eval_recall_micro": 0.8056603773584906,
2256
- "eval_runtime": 67.0079,
2257
- "eval_samples_per_second": 15.819,
2258
- "eval_steps_per_second": 3.955,
2259
- "step": 2500
2260
- },
2261
- {
2262
- "epoch": 1.99,
2263
- "learning_rate": 3e-05,
2264
- "loss": 0.0347,
2265
- "step": 2510
2266
- },
2267
- {
2268
- "epoch": 2.0,
2269
- "step": 2516,
2270
- "total_flos": 6.250904333773187e+17,
2271
- "train_loss": 0.05790289470982191,
2272
- "train_runtime": 10296.0519,
2273
- "train_samples_per_second": 3.91,
2274
- "train_steps_per_second": 0.244
2275
  }
2276
  ],
2277
  "logging_steps": 10,
2278
- "max_steps": 2516,
2279
  "num_input_tokens_seen": 0,
2280
  "num_train_epochs": 2,
2281
  "save_steps": 250,
2282
- "total_flos": 6.250904333773187e+17,
2283
- "train_batch_size": 4,
2284
  "trial_name": null,
2285
  "trial_params": null
2286
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 1.152165276122368,
5
  "eval_steps": 50,
6
+ "global_step": 725,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.02,
13
  "learning_rate": 3e-05,
14
+ "loss": 0.9927,
15
  "step": 10
16
  },
17
  {
18
+ "epoch": 0.03,
19
  "learning_rate": 3e-05,
20
+ "loss": 0.1794,
21
  "step": 20
22
  },
23
  {
24
+ "epoch": 0.05,
25
  "learning_rate": 3e-05,
26
+ "loss": 0.1202,
27
  "step": 30
28
  },
29
  {
30
+ "epoch": 0.06,
31
  "learning_rate": 3e-05,
32
+ "loss": 0.0955,
33
  "step": 40
34
  },
35
  {
36
+ "epoch": 0.08,
37
  "learning_rate": 3e-05,
38
+ "loss": 0.0886,
39
  "step": 50
40
  },
41
  {
42
+ "epoch": 0.08,
43
+ "eval_accuracy": 0.5773584905660377,
44
+ "eval_f1_macro": 0.32220178110380887,
45
+ "eval_f1_micro": 0.5773584905660377,
46
+ "eval_loss": 0.10824274271726608,
47
+ "eval_precision_macro": 0.3987863781703509,
48
+ "eval_precision_micro": 0.5773584905660377,
49
+ "eval_recall_macro": 0.31235484191508905,
50
+ "eval_recall_micro": 0.5773584905660377,
51
+ "eval_runtime": 68.6809,
52
+ "eval_samples_per_second": 15.434,
53
+ "eval_steps_per_second": 3.858,
54
  "step": 50
55
  },
56
  {
57
+ "epoch": 0.1,
58
  "learning_rate": 3e-05,
59
+ "loss": 0.2261,
60
  "step": 60
61
  },
62
  {
63
+ "epoch": 0.11,
64
  "learning_rate": 3e-05,
65
+ "loss": 0.0728,
66
  "step": 70
67
  },
68
  {
69
+ "epoch": 0.13,
70
  "learning_rate": 3e-05,
71
+ "loss": 0.0659,
72
  "step": 80
73
  },
74
  {
75
+ "epoch": 0.14,
76
  "learning_rate": 3e-05,
77
+ "loss": 0.051,
78
  "step": 90
79
  },
80
  {
81
+ "epoch": 0.16,
82
  "learning_rate": 3e-05,
83
+ "loss": 0.0572,
84
  "step": 100
85
  },
86
  {
87
+ "epoch": 0.16,
88
+ "eval_accuracy": 0.5877358490566038,
89
+ "eval_f1_macro": 0.379683051721915,
90
+ "eval_f1_micro": 0.5877358490566038,
91
+ "eval_loss": 0.08316469192504883,
92
+ "eval_precision_macro": 0.4716361677310224,
93
+ "eval_precision_micro": 0.5877358490566038,
94
+ "eval_recall_macro": 0.3681154625887901,
95
+ "eval_recall_micro": 0.5877358490566038,
96
+ "eval_runtime": 68.5728,
97
+ "eval_samples_per_second": 15.458,
98
+ "eval_steps_per_second": 3.865,
99
  "step": 100
100
  },
101
  {
102
+ "epoch": 0.17,
103
  "learning_rate": 3e-05,
104
+ "loss": 0.1229,
105
  "step": 110
106
  },
107
  {
108
+ "epoch": 0.19,
109
  "learning_rate": 3e-05,
110
+ "loss": 0.0561,
111
  "step": 120
112
  },
113
  {
114
+ "epoch": 0.21,
115
  "learning_rate": 3e-05,
116
+ "loss": 0.0549,
117
  "step": 130
118
  },
119
  {
120
+ "epoch": 0.22,
121
  "learning_rate": 3e-05,
122
+ "loss": 0.0562,
123
  "step": 140
124
  },
125
  {
126
+ "epoch": 0.24,
127
  "learning_rate": 3e-05,
128
+ "loss": 0.0496,
129
  "step": 150
130
  },
131
  {
132
+ "epoch": 0.24,
133
+ "eval_accuracy": 0.7311320754716981,
134
+ "eval_f1_macro": 0.5702888507555568,
135
+ "eval_f1_micro": 0.7311320754716981,
136
+ "eval_loss": 0.05246850848197937,
137
+ "eval_precision_macro": 0.5911132598584014,
138
+ "eval_precision_micro": 0.7311320754716981,
139
+ "eval_recall_macro": 0.5746634992387409,
140
+ "eval_recall_micro": 0.7311320754716981,
141
+ "eval_runtime": 68.5467,
142
+ "eval_samples_per_second": 15.464,
143
+ "eval_steps_per_second": 3.866,
144
  "step": 150
145
  },
146
  {
147
+ "epoch": 0.25,
148
  "learning_rate": 3e-05,
149
+ "loss": 0.067,
150
  "step": 160
151
  },
152
  {
153
+ "epoch": 0.27,
154
  "learning_rate": 3e-05,
155
+ "loss": 0.0551,
156
  "step": 170
157
  },
158
  {
159
+ "epoch": 0.29,
160
  "learning_rate": 3e-05,
161
+ "loss": 0.0426,
162
  "step": 180
163
  },
164
  {
165
+ "epoch": 0.3,
166
  "learning_rate": 3e-05,
167
+ "loss": 0.0475,
168
  "step": 190
169
  },
170
  {
171
+ "epoch": 0.32,
172
  "learning_rate": 3e-05,
173
+ "loss": 0.0541,
174
  "step": 200
175
  },
176
  {
177
+ "epoch": 0.32,
178
+ "eval_accuracy": 0.7566037735849057,
179
+ "eval_f1_macro": 0.5584352420606399,
180
+ "eval_f1_micro": 0.7566037735849057,
181
+ "eval_loss": 0.04639677330851555,
182
+ "eval_precision_macro": 0.6151030653662273,
183
+ "eval_precision_micro": 0.7566037735849057,
184
+ "eval_recall_macro": 0.5606200967693734,
185
+ "eval_recall_micro": 0.7566037735849057,
186
+ "eval_runtime": 68.5714,
187
+ "eval_samples_per_second": 15.458,
188
+ "eval_steps_per_second": 3.865,
189
  "step": 200
190
  },
191
  {
192
+ "epoch": 0.33,
193
  "learning_rate": 3e-05,
194
+ "loss": 0.052,
195
  "step": 210
196
  },
197
  {
198
+ "epoch": 0.35,
199
  "learning_rate": 3e-05,
200
+ "loss": 0.0435,
201
  "step": 220
202
  },
203
  {
204
+ "epoch": 0.37,
205
  "learning_rate": 3e-05,
206
+ "loss": 0.0407,
207
  "step": 230
208
  },
209
  {
210
+ "epoch": 0.38,
211
  "learning_rate": 3e-05,
212
+ "loss": 0.0475,
213
  "step": 240
214
  },
215
  {
216
+ "epoch": 0.4,
217
  "learning_rate": 3e-05,
218
+ "loss": 0.0481,
219
  "step": 250
220
  },
221
  {
222
+ "epoch": 0.4,
223
+ "eval_accuracy": 0.7811320754716982,
224
+ "eval_f1_macro": 0.6368753336945477,
225
+ "eval_f1_micro": 0.7811320754716982,
226
+ "eval_loss": 0.04328591376543045,
227
+ "eval_precision_macro": 0.6636054486047108,
228
+ "eval_precision_micro": 0.7811320754716982,
229
+ "eval_recall_macro": 0.6513962288983521,
230
+ "eval_recall_micro": 0.7811320754716982,
231
+ "eval_runtime": 68.5331,
232
+ "eval_samples_per_second": 15.467,
233
+ "eval_steps_per_second": 3.867,
234
  "step": 250
235
  },
236
  {
237
+ "epoch": 0.41,
238
  "learning_rate": 3e-05,
239
+ "loss": 0.0533,
240
  "step": 260
241
  },
242
  {
243
+ "epoch": 0.43,
244
  "learning_rate": 3e-05,
245
+ "loss": 0.0433,
246
  "step": 270
247
  },
248
  {
249
+ "epoch": 0.44,
250
  "learning_rate": 3e-05,
251
+ "loss": 0.0432,
252
  "step": 280
253
  },
254
  {
255
+ "epoch": 0.46,
256
  "learning_rate": 3e-05,
257
+ "loss": 0.0466,
258
  "step": 290
259
  },
260
  {
261
+ "epoch": 0.48,
262
  "learning_rate": 3e-05,
263
+ "loss": 0.053,
264
  "step": 300
265
  },
266
  {
267
+ "epoch": 0.48,
268
+ "eval_accuracy": 0.7632075471698113,
269
+ "eval_f1_macro": 0.6337788769448006,
270
+ "eval_f1_micro": 0.7632075471698113,
271
+ "eval_loss": 0.0451766662299633,
272
+ "eval_precision_macro": 0.6935549047637881,
273
+ "eval_precision_micro": 0.7632075471698113,
274
+ "eval_recall_macro": 0.6461210681607904,
275
+ "eval_recall_micro": 0.7632075471698113,
276
+ "eval_runtime": 68.6668,
277
+ "eval_samples_per_second": 15.437,
278
+ "eval_steps_per_second": 3.859,
279
  "step": 300
280
  },
281
  {
282
+ "epoch": 0.49,
283
  "learning_rate": 3e-05,
284
+ "loss": 0.0455,
285
  "step": 310
286
  },
287
  {
288
+ "epoch": 0.51,
289
  "learning_rate": 3e-05,
290
+ "loss": 0.049,
291
  "step": 320
292
  },
293
  {
294
+ "epoch": 0.52,
295
  "learning_rate": 3e-05,
296
+ "loss": 0.0426,
297
  "step": 330
298
  },
299
  {
300
+ "epoch": 0.54,
301
  "learning_rate": 3e-05,
302
+ "loss": 0.0396,
303
  "step": 340
304
  },
305
  {
306
+ "epoch": 0.56,
307
  "learning_rate": 3e-05,
308
+ "loss": 0.0401,
309
  "step": 350
310
  },
311
  {
312
+ "epoch": 0.56,
313
+ "eval_accuracy": 0.7943396226415095,
314
+ "eval_f1_macro": 0.6696815276160976,
315
+ "eval_f1_micro": 0.7943396226415095,
316
+ "eval_loss": 0.039866555482149124,
317
+ "eval_precision_macro": 0.7380690805686413,
318
+ "eval_precision_micro": 0.7943396226415095,
319
+ "eval_recall_macro": 0.660366139065656,
320
+ "eval_recall_micro": 0.7943396226415095,
321
+ "eval_runtime": 68.673,
322
+ "eval_samples_per_second": 15.435,
323
+ "eval_steps_per_second": 3.859,
324
  "step": 350
325
  },
326
  {
327
+ "epoch": 0.57,
328
  "learning_rate": 3e-05,
329
+ "loss": 0.0482,
330
  "step": 360
331
  },
332
  {
333
+ "epoch": 0.59,
334
  "learning_rate": 3e-05,
335
+ "loss": 0.0373,
336
  "step": 370
337
  },
338
  {
339
+ "epoch": 0.6,
340
  "learning_rate": 3e-05,
341
+ "loss": 0.036,
342
  "step": 380
343
  },
344
  {
345
+ "epoch": 0.62,
346
  "learning_rate": 3e-05,
347
+ "loss": 0.0428,
348
  "step": 390
349
  },
350
  {
351
+ "epoch": 0.64,
352
  "learning_rate": 3e-05,
353
+ "loss": 0.0509,
354
  "step": 400
355
  },
356
  {
357
+ "epoch": 0.64,
358
+ "eval_accuracy": 0.8009433962264151,
359
+ "eval_f1_macro": 0.6501313860427956,
360
+ "eval_f1_micro": 0.8009433962264151,
361
+ "eval_loss": 0.03930637985467911,
362
+ "eval_precision_macro": 0.6546476325081232,
363
+ "eval_precision_micro": 0.8009433962264151,
364
+ "eval_recall_macro": 0.6611935636788673,
365
+ "eval_recall_micro": 0.8009433962264151,
366
+ "eval_runtime": 68.557,
367
+ "eval_samples_per_second": 15.462,
368
+ "eval_steps_per_second": 3.865,
369
  "step": 400
370
  },
371
  {
372
+ "epoch": 0.65,
373
  "learning_rate": 3e-05,
374
+ "loss": 0.0375,
375
  "step": 410
376
  },
377
  {
378
+ "epoch": 0.67,
379
  "learning_rate": 3e-05,
380
+ "loss": 0.041,
381
  "step": 420
382
  },
383
  {
384
+ "epoch": 0.68,
385
  "learning_rate": 3e-05,
386
+ "loss": 0.0416,
387
  "step": 430
388
  },
389
  {
390
+ "epoch": 0.7,
391
  "learning_rate": 3e-05,
392
+ "loss": 0.0396,
393
  "step": 440
394
  },
395
  {
396
+ "epoch": 0.72,
397
  "learning_rate": 3e-05,
398
+ "loss": 0.0474,
399
  "step": 450
400
  },
401
  {
402
+ "epoch": 0.72,
403
+ "eval_accuracy": 0.8018867924528302,
404
+ "eval_f1_macro": 0.6864569711826704,
405
+ "eval_f1_micro": 0.8018867924528302,
406
+ "eval_loss": 0.04012966528534889,
407
+ "eval_precision_macro": 0.7255429634365795,
408
+ "eval_precision_micro": 0.8018867924528302,
409
+ "eval_recall_macro": 0.6926779368041328,
410
+ "eval_recall_micro": 0.8018867924528302,
411
+ "eval_runtime": 68.6193,
412
+ "eval_samples_per_second": 15.448,
413
+ "eval_steps_per_second": 3.862,
414
  "step": 450
415
  },
416
  {
417
+ "epoch": 0.73,
418
  "learning_rate": 3e-05,
419
+ "loss": 0.0434,
420
  "step": 460
421
  },
422
  {
423
+ "epoch": 0.75,
424
  "learning_rate": 3e-05,
425
+ "loss": 0.0358,
426
  "step": 470
427
  },
428
  {
429
+ "epoch": 0.76,
430
  "learning_rate": 3e-05,
431
+ "loss": 0.0416,
432
  "step": 480
433
  },
434
  {
435
+ "epoch": 0.78,
436
  "learning_rate": 3e-05,
437
+ "loss": 0.0334,
438
  "step": 490
439
  },
440
  {
441
+ "epoch": 0.79,
442
  "learning_rate": 3e-05,
443
+ "loss": 0.045,
444
  "step": 500
445
  },
446
  {
447
+ "epoch": 0.79,
448
+ "eval_accuracy": 0.8009433962264151,
449
+ "eval_f1_macro": 0.6977412107603574,
450
+ "eval_f1_micro": 0.8009433962264151,
451
+ "eval_loss": 0.0379195362329483,
452
+ "eval_precision_macro": 0.7146704097806501,
453
+ "eval_precision_micro": 0.8009433962264151,
454
+ "eval_recall_macro": 0.710805016133364,
455
+ "eval_recall_micro": 0.8009433962264151,
456
+ "eval_runtime": 68.6344,
457
+ "eval_samples_per_second": 15.444,
458
+ "eval_steps_per_second": 3.861,
459
  "step": 500
460
  },
461
  {
462
+ "epoch": 0.81,
463
  "learning_rate": 3e-05,
464
+ "loss": 0.0425,
465
  "step": 510
466
  },
467
  {
468
+ "epoch": 0.83,
469
  "learning_rate": 3e-05,
470
+ "loss": 0.036,
471
  "step": 520
472
  },
473
  {
474
+ "epoch": 0.84,
475
  "learning_rate": 3e-05,
476
+ "loss": 0.0444,
477
  "step": 530
478
  },
479
  {
480
+ "epoch": 0.86,
481
  "learning_rate": 3e-05,
482
+ "loss": 0.0394,
483
  "step": 540
484
  },
485
  {
486
+ "epoch": 0.87,
487
  "learning_rate": 3e-05,
488
+ "loss": 0.0335,
489
  "step": 550
490
  },
491
  {
492
+ "epoch": 0.87,
493
+ "eval_accuracy": 0.8150943396226416,
494
+ "eval_f1_macro": 0.7134710025440245,
495
+ "eval_f1_micro": 0.8150943396226416,
496
+ "eval_loss": 0.03691105917096138,
497
+ "eval_precision_macro": 0.7046165923538829,
498
+ "eval_precision_micro": 0.8150943396226416,
499
+ "eval_recall_macro": 0.7335435173781253,
500
+ "eval_recall_micro": 0.8150943396226416,
501
+ "eval_runtime": 68.5596,
502
+ "eval_samples_per_second": 15.461,
503
+ "eval_steps_per_second": 3.865,
504
  "step": 550
505
  },
506
  {
507
+ "epoch": 0.89,
508
  "learning_rate": 3e-05,
509
+ "loss": 0.0421,
510
  "step": 560
511
  },
512
  {
513
+ "epoch": 0.91,
514
  "learning_rate": 3e-05,
515
+ "loss": 0.0407,
516
  "step": 570
517
  },
518
  {
519
+ "epoch": 0.92,
520
  "learning_rate": 3e-05,
521
+ "loss": 0.0429,
522
  "step": 580
523
  },
524
  {
525
+ "epoch": 0.94,
526
  "learning_rate": 3e-05,
527
+ "loss": 0.0378,
528
  "step": 590
529
  },
530
  {
531
+ "epoch": 0.95,
532
  "learning_rate": 3e-05,
533
+ "loss": 0.0429,
534
  "step": 600
535
  },
536
  {
537
+ "epoch": 0.95,
538
+ "eval_accuracy": 0.7962264150943397,
539
+ "eval_f1_macro": 0.687832173620461,
540
+ "eval_f1_micro": 0.7962264150943396,
541
+ "eval_loss": 0.03668661788105965,
542
+ "eval_precision_macro": 0.7081030422724828,
543
+ "eval_precision_micro": 0.7962264150943397,
544
+ "eval_recall_macro": 0.6958905634881637,
545
+ "eval_recall_micro": 0.7962264150943397,
546
+ "eval_runtime": 68.6225,
547
+ "eval_samples_per_second": 15.447,
548
+ "eval_steps_per_second": 3.862,
549
  "step": 600
550
  },
551
  {
552
+ "epoch": 0.97,
553
  "learning_rate": 3e-05,
554
+ "loss": 0.0394,
555
  "step": 610
556
  },
557
  {
558
+ "epoch": 0.99,
559
  "learning_rate": 3e-05,
560
+ "loss": 0.0363,
561
  "step": 620
562
  },
563
  {
564
+ "epoch": 1.0,
565
  "learning_rate": 3e-05,
566
+ "loss": 0.0428,
567
  "step": 630
568
  },
569
  {
570
+ "epoch": 1.02,
571
  "learning_rate": 3e-05,
572
+ "loss": 0.0265,
573
  "step": 640
574
  },
575
  {
576
+ "epoch": 1.03,
577
  "learning_rate": 3e-05,
578
+ "loss": 0.0253,
579
  "step": 650
580
  },
581
  {
582
+ "epoch": 1.03,
583
+ "eval_accuracy": 0.8254716981132075,
584
+ "eval_f1_macro": 0.7098363543260356,
585
+ "eval_f1_micro": 0.8254716981132075,
586
+ "eval_loss": 0.03421418368816376,
587
+ "eval_precision_macro": 0.7370277877974788,
588
+ "eval_precision_micro": 0.8254716981132075,
589
+ "eval_recall_macro": 0.6974501533785277,
590
+ "eval_recall_micro": 0.8254716981132075,
591
+ "eval_runtime": 68.5452,
592
+ "eval_samples_per_second": 15.464,
593
+ "eval_steps_per_second": 3.866,
594
  "step": 650
595
  },
596
  {
597
+ "epoch": 1.05,
598
  "learning_rate": 3e-05,
599
+ "loss": 0.0243,
600
  "step": 660
601
  },
602
  {
603
+ "epoch": 1.06,
604
  "learning_rate": 3e-05,
605
+ "loss": 0.0313,
606
  "step": 670
607
  },
608
  {
609
+ "epoch": 1.08,
610
  "learning_rate": 3e-05,
611
+ "loss": 0.0285,
612
  "step": 680
613
  },
614
  {
615
+ "epoch": 1.1,
616
  "learning_rate": 3e-05,
617
+ "loss": 0.0262,
618
  "step": 690
619
  },
620
  {
621
+ "epoch": 1.11,
622
  "learning_rate": 3e-05,
623
+ "loss": 0.0311,
624
  "step": 700
625
  },
626
  {
627
+ "epoch": 1.11,
628
+ "eval_accuracy": 0.8047169811320755,
629
+ "eval_f1_macro": 0.6661175684975367,
630
+ "eval_f1_micro": 0.8047169811320755,
631
+ "eval_loss": 0.0357016883790493,
632
+ "eval_precision_macro": 0.6994736595477448,
633
+ "eval_precision_micro": 0.8047169811320755,
634
+ "eval_recall_macro": 0.660920267471505,
635
+ "eval_recall_micro": 0.8047169811320755,
636
+ "eval_runtime": 68.6332,
637
+ "eval_samples_per_second": 15.444,
638
+ "eval_steps_per_second": 3.861,
639
  "step": 700
640
  },
641
  {
642
+ "epoch": 1.13,
643
  "learning_rate": 3e-05,
644
+ "loss": 0.0311,
645
  "step": 710
646
  },
647
  {
648
+ "epoch": 1.14,
649
  "learning_rate": 3e-05,
650
+ "loss": 0.0298,
651
  "step": 720
652
  },
653
  {
654
+ "epoch": 1.15,
655
+ "step": 725,
656
+ "total_flos": 3.949643257934285e+17,
657
+ "train_loss": 0.06415341473858932,
658
+ "train_runtime": 4786.626,
659
+ "train_samples_per_second": 4.847,
660
+ "train_steps_per_second": 0.151
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
661
  }
662
  ],
663
  "logging_steps": 10,
664
+ "max_steps": 725,
665
  "num_input_tokens_seen": 0,
666
  "num_train_epochs": 2,
667
  "save_steps": 250,
668
+ "total_flos": 3.949643257934285e+17,
669
+ "train_batch_size": 8,
670
  "trial_name": null,
671
  "trial_params": null
672
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9aa409b842e6508386b92f28b3d9a90969b3355d546c84d641c78491d8d4d0e8
3
  size 6712
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9bf04f3f4781ddaca55355307209daf77a530710545740be26ab36316891d09c
3
  size 6712