lombardata commited on
Commit
2e3f7df
1 Parent(s): 185dce1

Evaluation on the test set completed on 2024_11_15.

Browse files
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: facebook/dinov2-large
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: drone-DinoVdeau-from-probs-large-2024_11_15-batch-size64_freeze_probs
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # drone-DinoVdeau-from-probs-large-2024_11_15-batch-size64_freeze_probs
15
+
16
+ This model is a fine-tuned version of [facebook/dinov2-large](https://huggingface.co/facebook/dinov2-large) on the None dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.4672
19
+ - Rmse: 0.1553
20
+ - Mae: 0.1147
21
+ - Kl Divergence: 0.3577
22
+ - Explained Variance: 0.4654
23
+ - Learning Rate: 0.0000
24
+
25
+ ## Model description
26
+
27
+ More information needed
28
+
29
+ ## Intended uses & limitations
30
+
31
+ More information needed
32
+
33
+ ## Training and evaluation data
34
+
35
+ More information needed
36
+
37
+ ## Training procedure
38
+
39
+ ### Training hyperparameters
40
+
41
+ The following hyperparameters were used during training:
42
+ - learning_rate: 0.001
43
+ - train_batch_size: 64
44
+ - eval_batch_size: 64
45
+ - seed: 42
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: linear
48
+ - num_epochs: 150
49
+ - mixed_precision_training: Native AMP
50
+
51
+ ### Training results
52
+
53
+ | Training Loss | Epoch | Step | Validation Loss | Rmse | Mae | Kl Divergence | Explained Variance | Rate |
54
+ |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:-------------:|:------------------:|:------:|
55
+ | No log | 1.0 | 110 | 0.5006 | 0.1904 | 0.1552 | 0.1025 | 0.3284 | 0.001 |
56
+ | No log | 2.0 | 220 | 0.4755 | 0.1681 | 0.1245 | 0.5180 | 0.3932 | 0.001 |
57
+ | No log | 3.0 | 330 | 0.4745 | 0.1675 | 0.1227 | 0.6862 | 0.3975 | 0.001 |
58
+ | No log | 4.0 | 440 | 0.4742 | 0.1672 | 0.1255 | 0.3212 | 0.4024 | 0.001 |
59
+ | 0.5081 | 5.0 | 550 | 0.4725 | 0.1653 | 0.1224 | 0.5072 | 0.4118 | 0.001 |
60
+ | 0.5081 | 6.0 | 660 | 0.4726 | 0.1657 | 0.1216 | 0.6710 | 0.4101 | 0.001 |
61
+ | 0.5081 | 7.0 | 770 | 0.4732 | 0.1655 | 0.1255 | 0.3162 | 0.4183 | 0.001 |
62
+ | 0.5081 | 8.0 | 880 | 0.4728 | 0.1651 | 0.1260 | 0.2719 | 0.4234 | 0.001 |
63
+ | 0.5081 | 9.0 | 990 | 0.4708 | 0.1639 | 0.1206 | 0.6393 | 0.4237 | 0.001 |
64
+ | 0.4668 | 10.0 | 1100 | 0.4733 | 0.1654 | 0.1230 | 0.5359 | 0.4151 | 0.001 |
65
+ | 0.4668 | 11.0 | 1210 | 0.4716 | 0.1647 | 0.1253 | 0.2479 | 0.4305 | 0.001 |
66
+ | 0.4668 | 12.0 | 1320 | 0.4708 | 0.1631 | 0.1244 | 0.3119 | 0.4358 | 0.001 |
67
+ | 0.4668 | 13.0 | 1430 | 0.4715 | 0.1635 | 0.1230 | 0.3694 | 0.4274 | 0.001 |
68
+ | 0.4641 | 14.0 | 1540 | 0.4721 | 0.1653 | 0.1216 | 0.5592 | 0.4134 | 0.001 |
69
+ | 0.4641 | 15.0 | 1650 | 0.4701 | 0.1628 | 0.1213 | 0.4936 | 0.4314 | 0.001 |
70
+ | 0.4641 | 16.0 | 1760 | 0.4719 | 0.1646 | 0.1229 | 0.2820 | 0.4328 | 0.001 |
71
+ | 0.4641 | 17.0 | 1870 | 0.4693 | 0.1621 | 0.1200 | 0.5294 | 0.4332 | 0.001 |
72
+ | 0.4641 | 18.0 | 1980 | 0.4710 | 0.1635 | 0.1216 | 0.4093 | 0.4294 | 0.001 |
73
+ | 0.4618 | 19.0 | 2090 | 0.4698 | 0.1622 | 0.1219 | 0.2918 | 0.4388 | 0.001 |
74
+ | 0.4618 | 20.0 | 2200 | 0.4692 | 0.1617 | 0.1190 | 0.4772 | 0.4355 | 0.001 |
75
+ | 0.4618 | 21.0 | 2310 | 0.4683 | 0.1606 | 0.1204 | 0.4336 | 0.4424 | 0.001 |
76
+ | 0.4618 | 22.0 | 2420 | 0.4724 | 0.1650 | 0.1183 | 0.7962 | 0.4233 | 0.001 |
77
+ | 0.4613 | 23.0 | 2530 | 0.4714 | 0.1641 | 0.1223 | 0.2854 | 0.4354 | 0.001 |
78
+ | 0.4613 | 24.0 | 2640 | 0.4707 | 0.1633 | 0.1207 | 0.4206 | 0.4280 | 0.001 |
79
+ | 0.4613 | 25.0 | 2750 | 0.4679 | 0.1606 | 0.1185 | 0.5436 | 0.4416 | 0.001 |
80
+ | 0.4613 | 26.0 | 2860 | 0.4708 | 0.1634 | 0.1192 | 0.4964 | 0.4268 | 0.001 |
81
+ | 0.4613 | 27.0 | 2970 | 0.4695 | 0.1625 | 0.1185 | 0.6399 | 0.4301 | 0.001 |
82
+ | 0.4607 | 28.0 | 3080 | 0.4701 | 0.1624 | 0.1184 | 0.5737 | 0.4324 | 0.001 |
83
+ | 0.4607 | 29.0 | 3190 | 0.4699 | 0.1624 | 0.1200 | 0.4459 | 0.4324 | 0.001 |
84
+ | 0.4607 | 30.0 | 3300 | 0.4723 | 0.1643 | 0.1254 | 0.2726 | 0.4308 | 0.001 |
85
+ | 0.4607 | 31.0 | 3410 | 0.4696 | 0.1622 | 0.1184 | 0.5308 | 0.4313 | 0.001 |
86
+ | 0.4604 | 32.0 | 3520 | 0.4668 | 0.1593 | 0.1175 | 0.4200 | 0.4508 | 0.0001 |
87
+ | 0.4604 | 33.0 | 3630 | 0.4663 | 0.1587 | 0.1177 | 0.3529 | 0.4565 | 0.0001 |
88
+ | 0.4604 | 34.0 | 3740 | 0.4667 | 0.1592 | 0.1181 | 0.3588 | 0.4542 | 0.0001 |
89
+ | 0.4604 | 35.0 | 3850 | 0.4659 | 0.1584 | 0.1160 | 0.4813 | 0.4545 | 0.0001 |
90
+ | 0.4604 | 36.0 | 3960 | 0.4658 | 0.1581 | 0.1173 | 0.3504 | 0.4594 | 0.0001 |
91
+ | 0.4565 | 37.0 | 4070 | 0.4654 | 0.1578 | 0.1158 | 0.3919 | 0.4608 | 0.0001 |
92
+ | 0.4565 | 38.0 | 4180 | 0.4655 | 0.1580 | 0.1166 | 0.4058 | 0.4583 | 0.0001 |
93
+ | 0.4565 | 39.0 | 4290 | 0.4658 | 0.1585 | 0.1174 | 0.4118 | 0.4567 | 0.0001 |
94
+ | 0.4565 | 40.0 | 4400 | 0.4656 | 0.1579 | 0.1170 | 0.3564 | 0.4607 | 0.0001 |
95
+ | 0.4552 | 41.0 | 4510 | 0.4657 | 0.1582 | 0.1171 | 0.3573 | 0.4598 | 0.0001 |
96
+ | 0.4552 | 42.0 | 4620 | 0.4652 | 0.1579 | 0.1155 | 0.5042 | 0.4587 | 0.0001 |
97
+ | 0.4552 | 43.0 | 4730 | 0.4651 | 0.1575 | 0.1157 | 0.4462 | 0.4613 | 0.0001 |
98
+ | 0.4552 | 44.0 | 4840 | 0.4654 | 0.1579 | 0.1166 | 0.4236 | 0.4604 | 0.0001 |
99
+ | 0.4552 | 45.0 | 4950 | 0.4649 | 0.1574 | 0.1151 | 0.4510 | 0.4625 | 0.0001 |
100
+ | 0.4538 | 46.0 | 5060 | 0.4648 | 0.1575 | 0.1157 | 0.4490 | 0.4619 | 0.0001 |
101
+ | 0.4538 | 47.0 | 5170 | 0.4649 | 0.1574 | 0.1152 | 0.4751 | 0.4615 | 0.0001 |
102
+ | 0.4538 | 48.0 | 5280 | 0.4648 | 0.1575 | 0.1151 | 0.5305 | 0.4631 | 0.0001 |
103
+ | 0.4538 | 49.0 | 5390 | 0.4648 | 0.1574 | 0.1154 | 0.4799 | 0.4630 | 0.0001 |
104
+ | 0.4532 | 50.0 | 5500 | 0.4650 | 0.1572 | 0.1172 | 0.2825 | 0.4694 | 0.0001 |
105
+ | 0.4532 | 51.0 | 5610 | 0.4656 | 0.1582 | 0.1151 | 0.4879 | 0.4573 | 0.0001 |
106
+ | 0.4532 | 52.0 | 5720 | 0.4643 | 0.1566 | 0.1155 | 0.4199 | 0.4674 | 0.0001 |
107
+ | 0.4532 | 53.0 | 5830 | 0.4644 | 0.1569 | 0.1156 | 0.3880 | 0.4673 | 0.0001 |
108
+ | 0.4532 | 54.0 | 5940 | 0.4646 | 0.1569 | 0.1148 | 0.4229 | 0.4654 | 0.0001 |
109
+ | 0.4526 | 55.0 | 6050 | 0.4644 | 0.1569 | 0.1159 | 0.4009 | 0.4659 | 0.0001 |
110
+ | 0.4526 | 56.0 | 6160 | 0.4647 | 0.1572 | 0.1164 | 0.3405 | 0.4660 | 0.0001 |
111
+ | 0.4526 | 57.0 | 6270 | 0.4645 | 0.1569 | 0.1152 | 0.4188 | 0.4661 | 0.0001 |
112
+ | 0.4526 | 58.0 | 6380 | 0.4651 | 0.1576 | 0.1164 | 0.3079 | 0.4659 | 0.0001 |
113
+ | 0.4526 | 59.0 | 6490 | 0.4645 | 0.1570 | 0.1150 | 0.4339 | 0.4654 | 1e-05 |
114
+ | 0.4514 | 60.0 | 6600 | 0.4642 | 0.1566 | 0.1150 | 0.3894 | 0.4679 | 1e-05 |
115
+ | 0.4514 | 61.0 | 6710 | 0.4639 | 0.1563 | 0.1146 | 0.4145 | 0.4693 | 1e-05 |
116
+ | 0.4514 | 62.0 | 6820 | 0.4641 | 0.1565 | 0.1148 | 0.4064 | 0.4686 | 1e-05 |
117
+ | 0.4514 | 63.0 | 6930 | 0.4643 | 0.1565 | 0.1149 | 0.3542 | 0.4698 | 1e-05 |
118
+ | 0.4511 | 64.0 | 7040 | 0.4640 | 0.1564 | 0.1150 | 0.3718 | 0.4702 | 1e-05 |
119
+ | 0.4511 | 65.0 | 7150 | 0.4641 | 0.1565 | 0.1152 | 0.4128 | 0.4680 | 1e-05 |
120
+ | 0.4511 | 66.0 | 7260 | 0.4644 | 0.1570 | 0.1145 | 0.4988 | 0.4658 | 1e-05 |
121
+ | 0.4511 | 67.0 | 7370 | 0.4638 | 0.1562 | 0.1151 | 0.4122 | 0.4697 | 1e-05 |
122
+ | 0.4511 | 68.0 | 7480 | 0.4640 | 0.1565 | 0.1144 | 0.4579 | 0.4674 | 1e-05 |
123
+ | 0.4508 | 69.0 | 7590 | 0.4638 | 0.1561 | 0.1143 | 0.4197 | 0.4702 | 1e-05 |
124
+ | 0.4508 | 70.0 | 7700 | 0.4639 | 0.1563 | 0.1145 | 0.4286 | 0.4695 | 1e-05 |
125
+ | 0.4508 | 71.0 | 7810 | 0.4641 | 0.1563 | 0.1153 | 0.3542 | 0.4708 | 1e-05 |
126
+ | 0.4508 | 72.0 | 7920 | 0.4642 | 0.1566 | 0.1147 | 0.4250 | 0.4681 | 1e-05 |
127
+ | 0.4505 | 73.0 | 8030 | 0.4638 | 0.1561 | 0.1140 | 0.4397 | 0.4700 | 1e-05 |
128
+ | 0.4505 | 74.0 | 8140 | 0.4638 | 0.1563 | 0.1145 | 0.4437 | 0.4689 | 1e-05 |
129
+ | 0.4505 | 75.0 | 8250 | 0.4638 | 0.1561 | 0.1145 | 0.4049 | 0.4705 | 1e-05 |
130
+ | 0.4505 | 76.0 | 8360 | 0.4640 | 0.1565 | 0.1141 | 0.4926 | 0.4675 | 0.0000 |
131
+ | 0.4505 | 77.0 | 8470 | 0.4639 | 0.1562 | 0.1142 | 0.4427 | 0.4695 | 0.0000 |
132
+ | 0.4505 | 78.0 | 8580 | 0.4639 | 0.1563 | 0.1145 | 0.4293 | 0.4692 | 0.0000 |
133
+ | 0.4505 | 79.0 | 8690 | 0.4641 | 0.1564 | 0.1147 | 0.3765 | 0.4700 | 0.0000 |
134
+
135
+
136
+ ### Framework versions
137
+
138
+ - Transformers 4.41.0
139
+ - Pytorch 2.5.0+cu124
140
+ - Datasets 3.0.2
141
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 79.0,
3
+ "eval_explained_variance": 0.46537885069847107,
4
+ "eval_kl_divergence": 0.35774946212768555,
5
+ "eval_loss": 0.46723154187202454,
6
+ "eval_mae": 0.11465383321046829,
7
+ "eval_rmse": 0.15526758134365082,
8
+ "eval_runtime": 55.2715,
9
+ "eval_samples_per_second": 42.644,
10
+ "eval_steps_per_second": 0.669,
11
+ "learning_rate": 1.0000000000000002e-06,
12
+ "total_flos": 8.188406191467658e+19,
13
+ "train_loss": 0.4591466036709872,
14
+ "train_runtime": 19731.8487,
15
+ "train_samples_per_second": 53.236,
16
+ "train_steps_per_second": 0.836
17
+ }
logs/events.out.tfevents.1731665663.datavisu2 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d04dac874f724d44b6102737e4ab899eef475853039d695bdbd6868edb062988
3
- size 51599
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b9787c9faf824d22538efe608b24007520f5dca2eb1efccafa39476d057b2df
3
+ size 53097
logs/events.out.tfevents.1731685574.datavisu2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbfda6d0ad1fc5f8b4bdf95291947b1a81413fd40d2aee987af1023541160de6
3
+ size 40
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7684f69a18955a6e33ae1bc68f65a382fed0230bd887c3a6bb0c54d558fbfd6c
3
  size 1222956704
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:197459fbab5389ddd81caf8961bb524a564af1913bf5aec15a3fcccb6e30208d
3
  size 1222956704
test_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 79.0,
3
+ "eval_explained_variance": 0.46537885069847107,
4
+ "eval_kl_divergence": 0.35774946212768555,
5
+ "eval_loss": 0.46723154187202454,
6
+ "eval_mae": 0.11465383321046829,
7
+ "eval_rmse": 0.15526758134365082,
8
+ "eval_runtime": 55.2715,
9
+ "eval_samples_per_second": 42.644,
10
+ "eval_steps_per_second": 0.669,
11
+ "learning_rate": 1.0000000000000002e-06
12
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 79.0,
3
+ "learning_rate": 1.0000000000000002e-06,
4
+ "total_flos": 8.188406191467658e+19,
5
+ "train_loss": 0.4591466036709872,
6
+ "train_runtime": 19731.8487,
7
+ "train_samples_per_second": 53.236,
8
+ "train_steps_per_second": 0.836
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.4637599587440491,
3
+ "best_model_checkpoint": "/home/datawork-iot-nos/Seatizen/models/multilabel/drone/drone-DinoVdeau-from-probs-large-2024_11_15-batch-size64_freeze_probs/checkpoint-7590",
4
+ "epoch": 79.0,
5
+ "eval_steps": 500,
6
+ "global_step": 8690,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_explained_variance": 0.32841917872428894,
14
+ "eval_kl_divergence": 0.10252656042575836,
15
+ "eval_loss": 0.5005590319633484,
16
+ "eval_mae": 0.15520869195461273,
17
+ "eval_rmse": 0.19042611122131348,
18
+ "eval_runtime": 60.5528,
19
+ "eval_samples_per_second": 38.875,
20
+ "eval_steps_per_second": 0.611,
21
+ "learning_rate": 0.001,
22
+ "step": 110
23
+ },
24
+ {
25
+ "epoch": 2.0,
26
+ "eval_explained_variance": 0.3932196795940399,
27
+ "eval_kl_divergence": 0.5180067420005798,
28
+ "eval_loss": 0.47547808289527893,
29
+ "eval_mae": 0.12452296167612076,
30
+ "eval_rmse": 0.16812847554683685,
31
+ "eval_runtime": 57.4976,
32
+ "eval_samples_per_second": 40.941,
33
+ "eval_steps_per_second": 0.644,
34
+ "learning_rate": 0.001,
35
+ "step": 220
36
+ },
37
+ {
38
+ "epoch": 3.0,
39
+ "eval_explained_variance": 0.3974684476852417,
40
+ "eval_kl_divergence": 0.6862403154373169,
41
+ "eval_loss": 0.47452571988105774,
42
+ "eval_mae": 0.1226513460278511,
43
+ "eval_rmse": 0.16751675307750702,
44
+ "eval_runtime": 57.6506,
45
+ "eval_samples_per_second": 40.832,
46
+ "eval_steps_per_second": 0.642,
47
+ "learning_rate": 0.001,
48
+ "step": 330
49
+ },
50
+ {
51
+ "epoch": 4.0,
52
+ "eval_explained_variance": 0.40236756205558777,
53
+ "eval_kl_divergence": 0.3211989104747772,
54
+ "eval_loss": 0.47420722246170044,
55
+ "eval_mae": 0.1255439817905426,
56
+ "eval_rmse": 0.16721709072589874,
57
+ "eval_runtime": 58.0216,
58
+ "eval_samples_per_second": 40.571,
59
+ "eval_steps_per_second": 0.638,
60
+ "learning_rate": 0.001,
61
+ "step": 440
62
+ },
63
+ {
64
+ "epoch": 4.545454545454545,
65
+ "grad_norm": 0.20816726982593536,
66
+ "learning_rate": 0.001,
67
+ "loss": 0.5081,
68
+ "step": 500
69
+ },
70
+ {
71
+ "epoch": 5.0,
72
+ "eval_explained_variance": 0.4118477404117584,
73
+ "eval_kl_divergence": 0.5071600079536438,
74
+ "eval_loss": 0.47245556116104126,
75
+ "eval_mae": 0.12236347794532776,
76
+ "eval_rmse": 0.16526558995246887,
77
+ "eval_runtime": 60.9082,
78
+ "eval_samples_per_second": 38.648,
79
+ "eval_steps_per_second": 0.607,
80
+ "learning_rate": 0.001,
81
+ "step": 550
82
+ },
83
+ {
84
+ "epoch": 6.0,
85
+ "eval_explained_variance": 0.4100535213947296,
86
+ "eval_kl_divergence": 0.6710320115089417,
87
+ "eval_loss": 0.4725925624370575,
88
+ "eval_mae": 0.12164705991744995,
89
+ "eval_rmse": 0.16568797826766968,
90
+ "eval_runtime": 60.266,
91
+ "eval_samples_per_second": 39.06,
92
+ "eval_steps_per_second": 0.614,
93
+ "learning_rate": 0.001,
94
+ "step": 660
95
+ },
96
+ {
97
+ "epoch": 7.0,
98
+ "eval_explained_variance": 0.4183339774608612,
99
+ "eval_kl_divergence": 0.3161657452583313,
100
+ "eval_loss": 0.4731809198856354,
101
+ "eval_mae": 0.12548527121543884,
102
+ "eval_rmse": 0.16550247371196747,
103
+ "eval_runtime": 59.012,
104
+ "eval_samples_per_second": 39.89,
105
+ "eval_steps_per_second": 0.627,
106
+ "learning_rate": 0.001,
107
+ "step": 770
108
+ },
109
+ {
110
+ "epoch": 8.0,
111
+ "eval_explained_variance": 0.4233661890029907,
112
+ "eval_kl_divergence": 0.27189013361930847,
113
+ "eval_loss": 0.47284314036369324,
114
+ "eval_mae": 0.12600405514240265,
115
+ "eval_rmse": 0.16514724493026733,
116
+ "eval_runtime": 60.8246,
117
+ "eval_samples_per_second": 38.701,
118
+ "eval_steps_per_second": 0.608,
119
+ "learning_rate": 0.001,
120
+ "step": 880
121
+ },
122
+ {
123
+ "epoch": 9.0,
124
+ "eval_explained_variance": 0.42370346188545227,
125
+ "eval_kl_divergence": 0.6392844319343567,
126
+ "eval_loss": 0.4707973003387451,
127
+ "eval_mae": 0.12056442350149155,
128
+ "eval_rmse": 0.16385647654533386,
129
+ "eval_runtime": 57.5192,
130
+ "eval_samples_per_second": 40.925,
131
+ "eval_steps_per_second": 0.643,
132
+ "learning_rate": 0.001,
133
+ "step": 990
134
+ },
135
+ {
136
+ "epoch": 9.090909090909092,
137
+ "grad_norm": 0.15108729898929596,
138
+ "learning_rate": 0.001,
139
+ "loss": 0.4668,
140
+ "step": 1000
141
+ },
142
+ {
143
+ "epoch": 10.0,
144
+ "eval_explained_variance": 0.41512250900268555,
145
+ "eval_kl_divergence": 0.5359246134757996,
146
+ "eval_loss": 0.4732784628868103,
147
+ "eval_mae": 0.12296172976493835,
148
+ "eval_rmse": 0.16544467210769653,
149
+ "eval_runtime": 60.8049,
150
+ "eval_samples_per_second": 38.714,
151
+ "eval_steps_per_second": 0.609,
152
+ "learning_rate": 0.001,
153
+ "step": 1100
154
+ },
155
+ {
156
+ "epoch": 11.0,
157
+ "eval_explained_variance": 0.43050625920295715,
158
+ "eval_kl_divergence": 0.24788798391819,
159
+ "eval_loss": 0.47162503004074097,
160
+ "eval_mae": 0.12532271444797516,
161
+ "eval_rmse": 0.1646868884563446,
162
+ "eval_runtime": 59.6353,
163
+ "eval_samples_per_second": 39.473,
164
+ "eval_steps_per_second": 0.62,
165
+ "learning_rate": 0.001,
166
+ "step": 1210
167
+ },
168
+ {
169
+ "epoch": 12.0,
170
+ "eval_explained_variance": 0.43575945496559143,
171
+ "eval_kl_divergence": 0.3118789792060852,
172
+ "eval_loss": 0.47083696722984314,
173
+ "eval_mae": 0.12438095360994339,
174
+ "eval_rmse": 0.16306261718273163,
175
+ "eval_runtime": 59.6011,
176
+ "eval_samples_per_second": 39.496,
177
+ "eval_steps_per_second": 0.621,
178
+ "learning_rate": 0.001,
179
+ "step": 1320
180
+ },
181
+ {
182
+ "epoch": 13.0,
183
+ "eval_explained_variance": 0.42740270495414734,
184
+ "eval_kl_divergence": 0.36944085359573364,
185
+ "eval_loss": 0.47152063250541687,
186
+ "eval_mae": 0.1230199933052063,
187
+ "eval_rmse": 0.16350014507770538,
188
+ "eval_runtime": 60.519,
189
+ "eval_samples_per_second": 38.897,
190
+ "eval_steps_per_second": 0.611,
191
+ "learning_rate": 0.001,
192
+ "step": 1430
193
+ },
194
+ {
195
+ "epoch": 13.636363636363637,
196
+ "grad_norm": 0.1494702696800232,
197
+ "learning_rate": 0.001,
198
+ "loss": 0.4641,
199
+ "step": 1500
200
+ },
201
+ {
202
+ "epoch": 14.0,
203
+ "eval_explained_variance": 0.41340962052345276,
204
+ "eval_kl_divergence": 0.5592221617698669,
205
+ "eval_loss": 0.47212228178977966,
206
+ "eval_mae": 0.12158066779375076,
207
+ "eval_rmse": 0.16525773704051971,
208
+ "eval_runtime": 59.2288,
209
+ "eval_samples_per_second": 39.744,
210
+ "eval_steps_per_second": 0.625,
211
+ "learning_rate": 0.001,
212
+ "step": 1540
213
+ },
214
+ {
215
+ "epoch": 15.0,
216
+ "eval_explained_variance": 0.43138065934181213,
217
+ "eval_kl_divergence": 0.49361512064933777,
218
+ "eval_loss": 0.47012239694595337,
219
+ "eval_mae": 0.12126541882753372,
220
+ "eval_rmse": 0.16284985840320587,
221
+ "eval_runtime": 61.7909,
222
+ "eval_samples_per_second": 38.096,
223
+ "eval_steps_per_second": 0.599,
224
+ "learning_rate": 0.001,
225
+ "step": 1650
226
+ },
227
+ {
228
+ "epoch": 16.0,
229
+ "eval_explained_variance": 0.43279382586479187,
230
+ "eval_kl_divergence": 0.2819983661174774,
231
+ "eval_loss": 0.4718552827835083,
232
+ "eval_mae": 0.12293669581413269,
233
+ "eval_rmse": 0.16459016501903534,
234
+ "eval_runtime": 59.5152,
235
+ "eval_samples_per_second": 39.553,
236
+ "eval_steps_per_second": 0.622,
237
+ "learning_rate": 0.001,
238
+ "step": 1760
239
+ },
240
+ {
241
+ "epoch": 17.0,
242
+ "eval_explained_variance": 0.43319597840309143,
243
+ "eval_kl_divergence": 0.5294199585914612,
244
+ "eval_loss": 0.46933484077453613,
245
+ "eval_mae": 0.12004240602254868,
246
+ "eval_rmse": 0.16205951571464539,
247
+ "eval_runtime": 59.464,
248
+ "eval_samples_per_second": 39.587,
249
+ "eval_steps_per_second": 0.622,
250
+ "learning_rate": 0.001,
251
+ "step": 1870
252
+ },
253
+ {
254
+ "epoch": 18.0,
255
+ "eval_explained_variance": 0.42939844727516174,
256
+ "eval_kl_divergence": 0.4093473255634308,
257
+ "eval_loss": 0.4710436165332794,
258
+ "eval_mae": 0.12161851674318314,
259
+ "eval_rmse": 0.16353638470172882,
260
+ "eval_runtime": 60.82,
261
+ "eval_samples_per_second": 38.704,
262
+ "eval_steps_per_second": 0.608,
263
+ "learning_rate": 0.001,
264
+ "step": 1980
265
+ },
266
+ {
267
+ "epoch": 18.181818181818183,
268
+ "grad_norm": 0.11152761429548264,
269
+ "learning_rate": 0.001,
270
+ "loss": 0.4618,
271
+ "step": 2000
272
+ },
273
+ {
274
+ "epoch": 19.0,
275
+ "eval_explained_variance": 0.4387861490249634,
276
+ "eval_kl_divergence": 0.29183313250541687,
277
+ "eval_loss": 0.4698491394519806,
278
+ "eval_mae": 0.12186750769615173,
279
+ "eval_rmse": 0.16223199665546417,
280
+ "eval_runtime": 62.7122,
281
+ "eval_samples_per_second": 37.537,
282
+ "eval_steps_per_second": 0.59,
283
+ "learning_rate": 0.001,
284
+ "step": 2090
285
+ },
286
+ {
287
+ "epoch": 20.0,
288
+ "eval_explained_variance": 0.4355180561542511,
289
+ "eval_kl_divergence": 0.47719886898994446,
290
+ "eval_loss": 0.4691685736179352,
291
+ "eval_mae": 0.11899092048406601,
292
+ "eval_rmse": 0.16173695027828217,
293
+ "eval_runtime": 60.1867,
294
+ "eval_samples_per_second": 39.112,
295
+ "eval_steps_per_second": 0.615,
296
+ "learning_rate": 0.001,
297
+ "step": 2200
298
+ },
299
+ {
300
+ "epoch": 21.0,
301
+ "eval_explained_variance": 0.44244399666786194,
302
+ "eval_kl_divergence": 0.4335584044456482,
303
+ "eval_loss": 0.46830564737319946,
304
+ "eval_mae": 0.12040043622255325,
305
+ "eval_rmse": 0.16058459877967834,
306
+ "eval_runtime": 59.7866,
307
+ "eval_samples_per_second": 39.373,
308
+ "eval_steps_per_second": 0.619,
309
+ "learning_rate": 0.001,
310
+ "step": 2310
311
+ },
312
+ {
313
+ "epoch": 22.0,
314
+ "eval_explained_variance": 0.4233216345310211,
315
+ "eval_kl_divergence": 0.7962150573730469,
316
+ "eval_loss": 0.47239789366722107,
317
+ "eval_mae": 0.11830935627222061,
318
+ "eval_rmse": 0.16501490771770477,
319
+ "eval_runtime": 62.1424,
320
+ "eval_samples_per_second": 37.881,
321
+ "eval_steps_per_second": 0.595,
322
+ "learning_rate": 0.001,
323
+ "step": 2420
324
+ },
325
+ {
326
+ "epoch": 22.727272727272727,
327
+ "grad_norm": 0.10114327073097229,
328
+ "learning_rate": 0.001,
329
+ "loss": 0.4613,
330
+ "step": 2500
331
+ },
332
+ {
333
+ "epoch": 23.0,
334
+ "eval_explained_variance": 0.43542811274528503,
335
+ "eval_kl_divergence": 0.2854216396808624,
336
+ "eval_loss": 0.47136834263801575,
337
+ "eval_mae": 0.12230511754751205,
338
+ "eval_rmse": 0.16408009827136993,
339
+ "eval_runtime": 61.631,
340
+ "eval_samples_per_second": 38.195,
341
+ "eval_steps_per_second": 0.6,
342
+ "learning_rate": 0.001,
343
+ "step": 2530
344
+ },
345
+ {
346
+ "epoch": 24.0,
347
+ "eval_explained_variance": 0.42795756459236145,
348
+ "eval_kl_divergence": 0.42056405544281006,
349
+ "eval_loss": 0.4706868529319763,
350
+ "eval_mae": 0.12066368013620377,
351
+ "eval_rmse": 0.16326285898685455,
352
+ "eval_runtime": 61.2844,
353
+ "eval_samples_per_second": 38.411,
354
+ "eval_steps_per_second": 0.604,
355
+ "learning_rate": 0.001,
356
+ "step": 2640
357
+ },
358
+ {
359
+ "epoch": 25.0,
360
+ "eval_explained_variance": 0.44159284234046936,
361
+ "eval_kl_divergence": 0.5435640811920166,
362
+ "eval_loss": 0.46786901354789734,
363
+ "eval_mae": 0.11850475519895554,
364
+ "eval_rmse": 0.16058622300624847,
365
+ "eval_runtime": 61.9284,
366
+ "eval_samples_per_second": 38.012,
367
+ "eval_steps_per_second": 0.597,
368
+ "learning_rate": 0.001,
369
+ "step": 2750
370
+ },
371
+ {
372
+ "epoch": 26.0,
373
+ "eval_explained_variance": 0.4267805814743042,
374
+ "eval_kl_divergence": 0.4964081943035126,
375
+ "eval_loss": 0.47084224224090576,
376
+ "eval_mae": 0.11923620104789734,
377
+ "eval_rmse": 0.16337566077709198,
378
+ "eval_runtime": 66.0163,
379
+ "eval_samples_per_second": 35.658,
380
+ "eval_steps_per_second": 0.56,
381
+ "learning_rate": 0.001,
382
+ "step": 2860
383
+ },
384
+ {
385
+ "epoch": 27.0,
386
+ "eval_explained_variance": 0.43011048436164856,
387
+ "eval_kl_divergence": 0.6398861408233643,
388
+ "eval_loss": 0.4695045053958893,
389
+ "eval_mae": 0.11852020025253296,
390
+ "eval_rmse": 0.16250041127204895,
391
+ "eval_runtime": 60.9743,
392
+ "eval_samples_per_second": 38.606,
393
+ "eval_steps_per_second": 0.607,
394
+ "learning_rate": 0.001,
395
+ "step": 2970
396
+ },
397
+ {
398
+ "epoch": 27.272727272727273,
399
+ "grad_norm": 0.12341216951608658,
400
+ "learning_rate": 0.001,
401
+ "loss": 0.4607,
402
+ "step": 3000
403
+ },
404
+ {
405
+ "epoch": 28.0,
406
+ "eval_explained_variance": 0.43241068720817566,
407
+ "eval_kl_divergence": 0.5736985206604004,
408
+ "eval_loss": 0.4700873792171478,
409
+ "eval_mae": 0.11835578829050064,
410
+ "eval_rmse": 0.16237075626850128,
411
+ "eval_runtime": 60.2395,
412
+ "eval_samples_per_second": 39.077,
413
+ "eval_steps_per_second": 0.614,
414
+ "learning_rate": 0.001,
415
+ "step": 3080
416
+ },
417
+ {
418
+ "epoch": 29.0,
419
+ "eval_explained_variance": 0.43240413069725037,
420
+ "eval_kl_divergence": 0.4459187090396881,
421
+ "eval_loss": 0.4698559045791626,
422
+ "eval_mae": 0.1200462281703949,
423
+ "eval_rmse": 0.16241396963596344,
424
+ "eval_runtime": 59.6241,
425
+ "eval_samples_per_second": 39.481,
426
+ "eval_steps_per_second": 0.621,
427
+ "learning_rate": 0.001,
428
+ "step": 3190
429
+ },
430
+ {
431
+ "epoch": 30.0,
432
+ "eval_explained_variance": 0.4308302402496338,
433
+ "eval_kl_divergence": 0.27262812852859497,
434
+ "eval_loss": 0.4722815454006195,
435
+ "eval_mae": 0.12538868188858032,
436
+ "eval_rmse": 0.1643446981906891,
437
+ "eval_runtime": 60.4817,
438
+ "eval_samples_per_second": 38.921,
439
+ "eval_steps_per_second": 0.612,
440
+ "learning_rate": 0.001,
441
+ "step": 3300
442
+ },
443
+ {
444
+ "epoch": 31.0,
445
+ "eval_explained_variance": 0.431255966424942,
446
+ "eval_kl_divergence": 0.5307573080062866,
447
+ "eval_loss": 0.46958214044570923,
448
+ "eval_mae": 0.11837340146303177,
449
+ "eval_rmse": 0.16221857070922852,
450
+ "eval_runtime": 59.6158,
451
+ "eval_samples_per_second": 39.486,
452
+ "eval_steps_per_second": 0.621,
453
+ "learning_rate": 0.001,
454
+ "step": 3410
455
+ },
456
+ {
457
+ "epoch": 31.818181818181817,
458
+ "grad_norm": 0.09215673804283142,
459
+ "learning_rate": 0.0001,
460
+ "loss": 0.4604,
461
+ "step": 3500
462
+ },
463
+ {
464
+ "epoch": 32.0,
465
+ "eval_explained_variance": 0.4507780075073242,
466
+ "eval_kl_divergence": 0.4200185239315033,
467
+ "eval_loss": 0.46677276492118835,
468
+ "eval_mae": 0.11745267361402512,
469
+ "eval_rmse": 0.1592676192522049,
470
+ "eval_runtime": 59.8038,
471
+ "eval_samples_per_second": 39.362,
472
+ "eval_steps_per_second": 0.619,
473
+ "learning_rate": 0.0001,
474
+ "step": 3520
475
+ },
476
+ {
477
+ "epoch": 33.0,
478
+ "eval_explained_variance": 0.4565463066101074,
479
+ "eval_kl_divergence": 0.35289108753204346,
480
+ "eval_loss": 0.46626824140548706,
481
+ "eval_mae": 0.11769836395978928,
482
+ "eval_rmse": 0.1586667150259018,
483
+ "eval_runtime": 63.0473,
484
+ "eval_samples_per_second": 37.337,
485
+ "eval_steps_per_second": 0.587,
486
+ "learning_rate": 0.0001,
487
+ "step": 3630
488
+ },
489
+ {
490
+ "epoch": 34.0,
491
+ "eval_explained_variance": 0.4541673958301544,
492
+ "eval_kl_divergence": 0.3587631583213806,
493
+ "eval_loss": 0.46665358543395996,
494
+ "eval_mae": 0.1181267499923706,
495
+ "eval_rmse": 0.15922589600086212,
496
+ "eval_runtime": 58.0806,
497
+ "eval_samples_per_second": 40.53,
498
+ "eval_steps_per_second": 0.637,
499
+ "learning_rate": 0.0001,
500
+ "step": 3740
501
+ },
502
+ {
503
+ "epoch": 35.0,
504
+ "eval_explained_variance": 0.4545403718948364,
505
+ "eval_kl_divergence": 0.4813242256641388,
506
+ "eval_loss": 0.46587392687797546,
507
+ "eval_mae": 0.11597732454538345,
508
+ "eval_rmse": 0.15844957530498505,
509
+ "eval_runtime": 59.5027,
510
+ "eval_samples_per_second": 39.561,
511
+ "eval_steps_per_second": 0.622,
512
+ "learning_rate": 0.0001,
513
+ "step": 3850
514
+ },
515
+ {
516
+ "epoch": 36.0,
517
+ "eval_explained_variance": 0.45941615104675293,
518
+ "eval_kl_divergence": 0.3503873348236084,
519
+ "eval_loss": 0.46578526496887207,
520
+ "eval_mae": 0.11725542694330215,
521
+ "eval_rmse": 0.15814347565174103,
522
+ "eval_runtime": 60.095,
523
+ "eval_samples_per_second": 39.171,
524
+ "eval_steps_per_second": 0.616,
525
+ "learning_rate": 0.0001,
526
+ "step": 3960
527
+ },
528
+ {
529
+ "epoch": 36.36363636363637,
530
+ "grad_norm": 0.08345460891723633,
531
+ "learning_rate": 0.0001,
532
+ "loss": 0.4565,
533
+ "step": 4000
534
+ },
535
+ {
536
+ "epoch": 37.0,
537
+ "eval_explained_variance": 0.4607694149017334,
538
+ "eval_kl_divergence": 0.39189669489860535,
539
+ "eval_loss": 0.4654408395290375,
540
+ "eval_mae": 0.11584330350160599,
541
+ "eval_rmse": 0.1577824205160141,
542
+ "eval_runtime": 58.734,
543
+ "eval_samples_per_second": 40.079,
544
+ "eval_steps_per_second": 0.63,
545
+ "learning_rate": 0.0001,
546
+ "step": 4070
547
+ },
548
+ {
549
+ "epoch": 38.0,
550
+ "eval_explained_variance": 0.45832768082618713,
551
+ "eval_kl_divergence": 0.40583303570747375,
552
+ "eval_loss": 0.46546319127082825,
553
+ "eval_mae": 0.1166045293211937,
554
+ "eval_rmse": 0.15796954929828644,
555
+ "eval_runtime": 58.3156,
556
+ "eval_samples_per_second": 40.367,
557
+ "eval_steps_per_second": 0.634,
558
+ "learning_rate": 0.0001,
559
+ "step": 4180
560
+ },
561
+ {
562
+ "epoch": 39.0,
563
+ "eval_explained_variance": 0.45672306418418884,
564
+ "eval_kl_divergence": 0.4117860198020935,
565
+ "eval_loss": 0.465843141078949,
566
+ "eval_mae": 0.11737682670354843,
567
+ "eval_rmse": 0.15845851600170135,
568
+ "eval_runtime": 59.8584,
569
+ "eval_samples_per_second": 39.326,
570
+ "eval_steps_per_second": 0.618,
571
+ "learning_rate": 0.0001,
572
+ "step": 4290
573
+ },
574
+ {
575
+ "epoch": 40.0,
576
+ "eval_explained_variance": 0.4607222080230713,
577
+ "eval_kl_divergence": 0.3563988506793976,
578
+ "eval_loss": 0.46561121940612793,
579
+ "eval_mae": 0.11697889119386673,
580
+ "eval_rmse": 0.15787295997142792,
581
+ "eval_runtime": 61.3479,
582
+ "eval_samples_per_second": 38.371,
583
+ "eval_steps_per_second": 0.603,
584
+ "learning_rate": 0.0001,
585
+ "step": 4400
586
+ },
587
+ {
588
+ "epoch": 40.90909090909091,
589
+ "grad_norm": 0.08773978799581528,
590
+ "learning_rate": 0.0001,
591
+ "loss": 0.4552,
592
+ "step": 4500
593
+ },
594
+ {
595
+ "epoch": 41.0,
596
+ "eval_explained_variance": 0.45979323983192444,
597
+ "eval_kl_divergence": 0.3572520911693573,
598
+ "eval_loss": 0.4657152593135834,
599
+ "eval_mae": 0.11711093783378601,
600
+ "eval_rmse": 0.15820421278476715,
601
+ "eval_runtime": 57.6839,
602
+ "eval_samples_per_second": 40.809,
603
+ "eval_steps_per_second": 0.641,
604
+ "learning_rate": 0.0001,
605
+ "step": 4510
606
+ },
607
+ {
608
+ "epoch": 42.0,
609
+ "eval_explained_variance": 0.45867350697517395,
610
+ "eval_kl_divergence": 0.5041557550430298,
611
+ "eval_loss": 0.4651602804660797,
612
+ "eval_mae": 0.11550069600343704,
613
+ "eval_rmse": 0.15786336362361908,
614
+ "eval_runtime": 56.8293,
615
+ "eval_samples_per_second": 41.422,
616
+ "eval_steps_per_second": 0.651,
617
+ "learning_rate": 0.0001,
618
+ "step": 4620
619
+ },
620
+ {
621
+ "epoch": 43.0,
622
+ "eval_explained_variance": 0.4612714946269989,
623
+ "eval_kl_divergence": 0.44621211290359497,
624
+ "eval_loss": 0.4651065468788147,
625
+ "eval_mae": 0.11574172228574753,
626
+ "eval_rmse": 0.15747833251953125,
627
+ "eval_runtime": 57.1474,
628
+ "eval_samples_per_second": 41.192,
629
+ "eval_steps_per_second": 0.647,
630
+ "learning_rate": 0.0001,
631
+ "step": 4730
632
+ },
633
+ {
634
+ "epoch": 44.0,
635
+ "eval_explained_variance": 0.4603614807128906,
636
+ "eval_kl_divergence": 0.4236082434654236,
637
+ "eval_loss": 0.46537330746650696,
638
+ "eval_mae": 0.11658215522766113,
639
+ "eval_rmse": 0.15792043507099152,
640
+ "eval_runtime": 55.8584,
641
+ "eval_samples_per_second": 42.142,
642
+ "eval_steps_per_second": 0.662,
643
+ "learning_rate": 0.0001,
644
+ "step": 4840
645
+ },
646
+ {
647
+ "epoch": 45.0,
648
+ "eval_explained_variance": 0.46250852942466736,
649
+ "eval_kl_divergence": 0.45096999406814575,
650
+ "eval_loss": 0.46489208936691284,
651
+ "eval_mae": 0.11505404114723206,
652
+ "eval_rmse": 0.15738531947135925,
653
+ "eval_runtime": 55.5313,
654
+ "eval_samples_per_second": 42.391,
655
+ "eval_steps_per_second": 0.666,
656
+ "learning_rate": 0.0001,
657
+ "step": 4950
658
+ },
659
+ {
660
+ "epoch": 45.45454545454545,
661
+ "grad_norm": 0.08461819589138031,
662
+ "learning_rate": 0.0001,
663
+ "loss": 0.4538,
664
+ "step": 5000
665
+ },
666
+ {
667
+ "epoch": 46.0,
668
+ "eval_explained_variance": 0.46191954612731934,
669
+ "eval_kl_divergence": 0.44900697469711304,
670
+ "eval_loss": 0.46484702825546265,
671
+ "eval_mae": 0.11566606909036636,
672
+ "eval_rmse": 0.15745492279529572,
673
+ "eval_runtime": 56.8805,
674
+ "eval_samples_per_second": 41.385,
675
+ "eval_steps_per_second": 0.65,
676
+ "learning_rate": 0.0001,
677
+ "step": 5060
678
+ },
679
+ {
680
+ "epoch": 47.0,
681
+ "eval_explained_variance": 0.46148741245269775,
682
+ "eval_kl_divergence": 0.47508490085601807,
683
+ "eval_loss": 0.4648602306842804,
684
+ "eval_mae": 0.11517279595136642,
685
+ "eval_rmse": 0.1574285626411438,
686
+ "eval_runtime": 56.4955,
687
+ "eval_samples_per_second": 41.667,
688
+ "eval_steps_per_second": 0.655,
689
+ "learning_rate": 0.0001,
690
+ "step": 5170
691
+ },
692
+ {
693
+ "epoch": 48.0,
694
+ "eval_explained_variance": 0.4631068706512451,
695
+ "eval_kl_divergence": 0.5305130481719971,
696
+ "eval_loss": 0.4647873342037201,
697
+ "eval_mae": 0.11513545364141464,
698
+ "eval_rmse": 0.15746952593326569,
699
+ "eval_runtime": 59.054,
700
+ "eval_samples_per_second": 39.862,
701
+ "eval_steps_per_second": 0.627,
702
+ "learning_rate": 0.0001,
703
+ "step": 5280
704
+ },
705
+ {
706
+ "epoch": 49.0,
707
+ "eval_explained_variance": 0.46304425597190857,
708
+ "eval_kl_divergence": 0.4798574149608612,
709
+ "eval_loss": 0.4647849500179291,
710
+ "eval_mae": 0.11539488285779953,
711
+ "eval_rmse": 0.1573745161294937,
712
+ "eval_runtime": 54.2646,
713
+ "eval_samples_per_second": 43.38,
714
+ "eval_steps_per_second": 0.682,
715
+ "learning_rate": 0.0001,
716
+ "step": 5390
717
+ },
718
+ {
719
+ "epoch": 50.0,
720
+ "grad_norm": 0.16299596428871155,
721
+ "learning_rate": 0.0001,
722
+ "loss": 0.4532,
723
+ "step": 5500
724
+ },
725
+ {
726
+ "epoch": 50.0,
727
+ "eval_explained_variance": 0.4693569839000702,
728
+ "eval_kl_divergence": 0.2825404107570648,
729
+ "eval_loss": 0.46499085426330566,
730
+ "eval_mae": 0.1172276958823204,
731
+ "eval_rmse": 0.15717318654060364,
732
+ "eval_runtime": 56.0282,
733
+ "eval_samples_per_second": 42.015,
734
+ "eval_steps_per_second": 0.66,
735
+ "learning_rate": 0.0001,
736
+ "step": 5500
737
+ },
738
+ {
739
+ "epoch": 51.0,
740
+ "eval_explained_variance": 0.4573368728160858,
741
+ "eval_kl_divergence": 0.48794299364089966,
742
+ "eval_loss": 0.465638667345047,
743
+ "eval_mae": 0.11509021371603012,
744
+ "eval_rmse": 0.15819959342479706,
745
+ "eval_runtime": 52.7895,
746
+ "eval_samples_per_second": 44.592,
747
+ "eval_steps_per_second": 0.701,
748
+ "learning_rate": 0.0001,
749
+ "step": 5610
750
+ },
751
+ {
752
+ "epoch": 52.0,
753
+ "eval_explained_variance": 0.4673852026462555,
754
+ "eval_kl_divergence": 0.41987907886505127,
755
+ "eval_loss": 0.46429532766342163,
756
+ "eval_mae": 0.11551753431558609,
757
+ "eval_rmse": 0.15662376582622528,
758
+ "eval_runtime": 54.5816,
759
+ "eval_samples_per_second": 43.128,
760
+ "eval_steps_per_second": 0.678,
761
+ "learning_rate": 0.0001,
762
+ "step": 5720
763
+ },
764
+ {
765
+ "epoch": 53.0,
766
+ "eval_explained_variance": 0.4672771692276001,
767
+ "eval_kl_divergence": 0.3879646956920624,
768
+ "eval_loss": 0.46441230177879333,
769
+ "eval_mae": 0.1155916228890419,
770
+ "eval_rmse": 0.1568875014781952,
771
+ "eval_runtime": 53.5146,
772
+ "eval_samples_per_second": 43.988,
773
+ "eval_steps_per_second": 0.691,
774
+ "learning_rate": 0.0001,
775
+ "step": 5830
776
+ },
777
+ {
778
+ "epoch": 54.0,
779
+ "eval_explained_variance": 0.4654136002063751,
780
+ "eval_kl_divergence": 0.42290592193603516,
781
+ "eval_loss": 0.4646008610725403,
782
+ "eval_mae": 0.11479470133781433,
783
+ "eval_rmse": 0.1569375991821289,
784
+ "eval_runtime": 53.8924,
785
+ "eval_samples_per_second": 43.68,
786
+ "eval_steps_per_second": 0.687,
787
+ "learning_rate": 0.0001,
788
+ "step": 5940
789
+ },
790
+ {
791
+ "epoch": 54.54545454545455,
792
+ "grad_norm": 0.08747697621583939,
793
+ "learning_rate": 0.0001,
794
+ "loss": 0.4526,
795
+ "step": 6000
796
+ },
797
+ {
798
+ "epoch": 55.0,
799
+ "eval_explained_variance": 0.4658801555633545,
800
+ "eval_kl_divergence": 0.40089842677116394,
801
+ "eval_loss": 0.4644174873828888,
802
+ "eval_mae": 0.11586496233940125,
803
+ "eval_rmse": 0.156887486577034,
804
+ "eval_runtime": 54.8967,
805
+ "eval_samples_per_second": 42.881,
806
+ "eval_steps_per_second": 0.674,
807
+ "learning_rate": 0.0001,
808
+ "step": 6050
809
+ },
810
+ {
811
+ "epoch": 56.0,
812
+ "eval_explained_variance": 0.46597158908843994,
813
+ "eval_kl_divergence": 0.34050217270851135,
814
+ "eval_loss": 0.464743047952652,
815
+ "eval_mae": 0.11636239290237427,
816
+ "eval_rmse": 0.15719135105609894,
817
+ "eval_runtime": 53.8695,
818
+ "eval_samples_per_second": 43.698,
819
+ "eval_steps_per_second": 0.687,
820
+ "learning_rate": 0.0001,
821
+ "step": 6160
822
+ },
823
+ {
824
+ "epoch": 57.0,
825
+ "eval_explained_variance": 0.4660731554031372,
826
+ "eval_kl_divergence": 0.4187561571598053,
827
+ "eval_loss": 0.4645179808139801,
828
+ "eval_mae": 0.11523237824440002,
829
+ "eval_rmse": 0.1568503975868225,
830
+ "eval_runtime": 52.6832,
831
+ "eval_samples_per_second": 44.682,
832
+ "eval_steps_per_second": 0.702,
833
+ "learning_rate": 0.0001,
834
+ "step": 6270
835
+ },
836
+ {
837
+ "epoch": 58.0,
838
+ "eval_explained_variance": 0.4659406840801239,
839
+ "eval_kl_divergence": 0.3079023063182831,
840
+ "eval_loss": 0.465102881193161,
841
+ "eval_mae": 0.11637380719184875,
842
+ "eval_rmse": 0.15757356584072113,
843
+ "eval_runtime": 53.7708,
844
+ "eval_samples_per_second": 43.778,
845
+ "eval_steps_per_second": 0.688,
846
+ "learning_rate": 0.0001,
847
+ "step": 6380
848
+ },
849
+ {
850
+ "epoch": 59.0,
851
+ "eval_explained_variance": 0.46542713046073914,
852
+ "eval_kl_divergence": 0.43387478590011597,
853
+ "eval_loss": 0.4644688367843628,
854
+ "eval_mae": 0.11504218727350235,
855
+ "eval_rmse": 0.15699030458927155,
856
+ "eval_runtime": 54.251,
857
+ "eval_samples_per_second": 43.391,
858
+ "eval_steps_per_second": 0.682,
859
+ "learning_rate": 1e-05,
860
+ "step": 6490
861
+ },
862
+ {
863
+ "epoch": 59.09090909090909,
864
+ "grad_norm": 0.09869211912155151,
865
+ "learning_rate": 1e-05,
866
+ "loss": 0.4514,
867
+ "step": 6500
868
+ },
869
+ {
870
+ "epoch": 60.0,
871
+ "eval_explained_variance": 0.4679425060749054,
872
+ "eval_kl_divergence": 0.38936442136764526,
873
+ "eval_loss": 0.46417686343193054,
874
+ "eval_mae": 0.11504556983709335,
875
+ "eval_rmse": 0.1565857082605362,
876
+ "eval_runtime": 53.3994,
877
+ "eval_samples_per_second": 44.083,
878
+ "eval_steps_per_second": 0.693,
879
+ "learning_rate": 1e-05,
880
+ "step": 6600
881
+ },
882
+ {
883
+ "epoch": 61.0,
884
+ "eval_explained_variance": 0.4692780673503876,
885
+ "eval_kl_divergence": 0.4144607186317444,
886
+ "eval_loss": 0.4639436900615692,
887
+ "eval_mae": 0.11456633359193802,
888
+ "eval_rmse": 0.15632741153240204,
889
+ "eval_runtime": 53.948,
890
+ "eval_samples_per_second": 43.635,
891
+ "eval_steps_per_second": 0.686,
892
+ "learning_rate": 1e-05,
893
+ "step": 6710
894
+ },
895
+ {
896
+ "epoch": 62.0,
897
+ "eval_explained_variance": 0.46859118342399597,
898
+ "eval_kl_divergence": 0.4063835144042969,
899
+ "eval_loss": 0.4641311764717102,
900
+ "eval_mae": 0.11482342332601547,
901
+ "eval_rmse": 0.15648160874843597,
902
+ "eval_runtime": 53.1646,
903
+ "eval_samples_per_second": 44.278,
904
+ "eval_steps_per_second": 0.696,
905
+ "learning_rate": 1e-05,
906
+ "step": 6820
907
+ },
908
+ {
909
+ "epoch": 63.0,
910
+ "eval_explained_variance": 0.4698045253753662,
911
+ "eval_kl_divergence": 0.35424694418907166,
912
+ "eval_loss": 0.4643491506576538,
913
+ "eval_mae": 0.11492928117513657,
914
+ "eval_rmse": 0.15652996301651,
915
+ "eval_runtime": 61.9895,
916
+ "eval_samples_per_second": 37.974,
917
+ "eval_steps_per_second": 0.597,
918
+ "learning_rate": 1e-05,
919
+ "step": 6930
920
+ },
921
+ {
922
+ "epoch": 63.63636363636363,
923
+ "grad_norm": 0.12132851779460907,
924
+ "learning_rate": 1e-05,
925
+ "loss": 0.4511,
926
+ "step": 7000
927
+ },
928
+ {
929
+ "epoch": 64.0,
930
+ "eval_explained_variance": 0.4702436923980713,
931
+ "eval_kl_divergence": 0.37175947427749634,
932
+ "eval_loss": 0.46402981877326965,
933
+ "eval_mae": 0.11502394080162048,
934
+ "eval_rmse": 0.1563546359539032,
935
+ "eval_runtime": 55.6273,
936
+ "eval_samples_per_second": 42.317,
937
+ "eval_steps_per_second": 0.665,
938
+ "learning_rate": 1e-05,
939
+ "step": 7040
940
+ },
941
+ {
942
+ "epoch": 65.0,
943
+ "eval_explained_variance": 0.46799585223197937,
944
+ "eval_kl_divergence": 0.41278746724128723,
945
+ "eval_loss": 0.4640822410583496,
946
+ "eval_mae": 0.11517596989870071,
947
+ "eval_rmse": 0.1565382480621338,
948
+ "eval_runtime": 60.037,
949
+ "eval_samples_per_second": 39.209,
950
+ "eval_steps_per_second": 0.616,
951
+ "learning_rate": 1e-05,
952
+ "step": 7150
953
+ },
954
+ {
955
+ "epoch": 66.0,
956
+ "eval_explained_variance": 0.46580052375793457,
957
+ "eval_kl_divergence": 0.4987623989582062,
958
+ "eval_loss": 0.46441909670829773,
959
+ "eval_mae": 0.11446693539619446,
960
+ "eval_rmse": 0.15703582763671875,
961
+ "eval_runtime": 58.422,
962
+ "eval_samples_per_second": 40.293,
963
+ "eval_steps_per_second": 0.633,
964
+ "learning_rate": 1e-05,
965
+ "step": 7260
966
+ },
967
+ {
968
+ "epoch": 67.0,
969
+ "eval_explained_variance": 0.4696963131427765,
970
+ "eval_kl_divergence": 0.41221925616264343,
971
+ "eval_loss": 0.46383005380630493,
972
+ "eval_mae": 0.11511614173650742,
973
+ "eval_rmse": 0.15620578825473785,
974
+ "eval_runtime": 57.3857,
975
+ "eval_samples_per_second": 41.021,
976
+ "eval_steps_per_second": 0.645,
977
+ "learning_rate": 1e-05,
978
+ "step": 7370
979
+ },
980
+ {
981
+ "epoch": 68.0,
982
+ "eval_explained_variance": 0.4673812687397003,
983
+ "eval_kl_divergence": 0.4579189419746399,
984
+ "eval_loss": 0.4639807641506195,
985
+ "eval_mae": 0.11436697095632553,
986
+ "eval_rmse": 0.15645776689052582,
987
+ "eval_runtime": 58.7335,
988
+ "eval_samples_per_second": 40.079,
989
+ "eval_steps_per_second": 0.63,
990
+ "learning_rate": 1e-05,
991
+ "step": 7480
992
+ },
993
+ {
994
+ "epoch": 68.18181818181819,
995
+ "grad_norm": 0.15623362362384796,
996
+ "learning_rate": 1e-05,
997
+ "loss": 0.4508,
998
+ "step": 7500
999
+ },
1000
+ {
1001
+ "epoch": 69.0,
1002
+ "eval_explained_variance": 0.4701990783214569,
1003
+ "eval_kl_divergence": 0.4197009801864624,
1004
+ "eval_loss": 0.4637599587440491,
1005
+ "eval_mae": 0.11433341354131699,
1006
+ "eval_rmse": 0.15607893466949463,
1007
+ "eval_runtime": 56.4381,
1008
+ "eval_samples_per_second": 41.709,
1009
+ "eval_steps_per_second": 0.656,
1010
+ "learning_rate": 1e-05,
1011
+ "step": 7590
1012
+ },
1013
+ {
1014
+ "epoch": 70.0,
1015
+ "eval_explained_variance": 0.46952661871910095,
1016
+ "eval_kl_divergence": 0.4285525679588318,
1017
+ "eval_loss": 0.46392253041267395,
1018
+ "eval_mae": 0.11449825018644333,
1019
+ "eval_rmse": 0.15625734627246857,
1020
+ "eval_runtime": 59.9257,
1021
+ "eval_samples_per_second": 39.282,
1022
+ "eval_steps_per_second": 0.617,
1023
+ "learning_rate": 1e-05,
1024
+ "step": 7700
1025
+ },
1026
+ {
1027
+ "epoch": 71.0,
1028
+ "eval_explained_variance": 0.4707754850387573,
1029
+ "eval_kl_divergence": 0.3542197048664093,
1030
+ "eval_loss": 0.46406444907188416,
1031
+ "eval_mae": 0.11525753885507584,
1032
+ "eval_rmse": 0.1563321352005005,
1033
+ "eval_runtime": 56.6326,
1034
+ "eval_samples_per_second": 41.566,
1035
+ "eval_steps_per_second": 0.653,
1036
+ "learning_rate": 1e-05,
1037
+ "step": 7810
1038
+ },
1039
+ {
1040
+ "epoch": 72.0,
1041
+ "eval_explained_variance": 0.4681284427642822,
1042
+ "eval_kl_divergence": 0.42497748136520386,
1043
+ "eval_loss": 0.46417826414108276,
1044
+ "eval_mae": 0.11474020034074783,
1045
+ "eval_rmse": 0.15662290155887604,
1046
+ "eval_runtime": 56.0497,
1047
+ "eval_samples_per_second": 41.998,
1048
+ "eval_steps_per_second": 0.66,
1049
+ "learning_rate": 1e-05,
1050
+ "step": 7920
1051
+ },
1052
+ {
1053
+ "epoch": 72.72727272727273,
1054
+ "grad_norm": 0.12685681879520416,
1055
+ "learning_rate": 1e-05,
1056
+ "loss": 0.4505,
1057
+ "step": 8000
1058
+ },
1059
+ {
1060
+ "epoch": 73.0,
1061
+ "eval_explained_variance": 0.47002461552619934,
1062
+ "eval_kl_divergence": 0.43972158432006836,
1063
+ "eval_loss": 0.4637835919857025,
1064
+ "eval_mae": 0.11403892189264297,
1065
+ "eval_rmse": 0.15611138939857483,
1066
+ "eval_runtime": 55.8354,
1067
+ "eval_samples_per_second": 42.16,
1068
+ "eval_steps_per_second": 0.663,
1069
+ "learning_rate": 1e-05,
1070
+ "step": 8030
1071
+ },
1072
+ {
1073
+ "epoch": 74.0,
1074
+ "eval_explained_variance": 0.4689449369907379,
1075
+ "eval_kl_divergence": 0.443666011095047,
1076
+ "eval_loss": 0.463798850774765,
1077
+ "eval_mae": 0.1145407184958458,
1078
+ "eval_rmse": 0.15625973045825958,
1079
+ "eval_runtime": 56.7357,
1080
+ "eval_samples_per_second": 41.491,
1081
+ "eval_steps_per_second": 0.652,
1082
+ "learning_rate": 1e-05,
1083
+ "step": 8140
1084
+ },
1085
+ {
1086
+ "epoch": 75.0,
1087
+ "eval_explained_variance": 0.4704826772212982,
1088
+ "eval_kl_divergence": 0.4049000144004822,
1089
+ "eval_loss": 0.46379053592681885,
1090
+ "eval_mae": 0.11447467654943466,
1091
+ "eval_rmse": 0.15613143146038055,
1092
+ "eval_runtime": 56.7932,
1093
+ "eval_samples_per_second": 41.449,
1094
+ "eval_steps_per_second": 0.651,
1095
+ "learning_rate": 1e-05,
1096
+ "step": 8250
1097
+ },
1098
+ {
1099
+ "epoch": 76.0,
1100
+ "eval_explained_variance": 0.4674541652202606,
1101
+ "eval_kl_divergence": 0.49260592460632324,
1102
+ "eval_loss": 0.4639701247215271,
1103
+ "eval_mae": 0.11414843797683716,
1104
+ "eval_rmse": 0.15647520124912262,
1105
+ "eval_runtime": 57.4638,
1106
+ "eval_samples_per_second": 40.965,
1107
+ "eval_steps_per_second": 0.644,
1108
+ "learning_rate": 1.0000000000000002e-06,
1109
+ "step": 8360
1110
+ },
1111
+ {
1112
+ "epoch": 77.0,
1113
+ "eval_explained_variance": 0.469455748796463,
1114
+ "eval_kl_divergence": 0.44272491335868835,
1115
+ "eval_loss": 0.463869571685791,
1116
+ "eval_mae": 0.11419638991355896,
1117
+ "eval_rmse": 0.15622590482234955,
1118
+ "eval_runtime": 57.5968,
1119
+ "eval_samples_per_second": 40.87,
1120
+ "eval_steps_per_second": 0.642,
1121
+ "learning_rate": 1.0000000000000002e-06,
1122
+ "step": 8470
1123
+ },
1124
+ {
1125
+ "epoch": 77.27272727272727,
1126
+ "grad_norm": 0.11736844480037689,
1127
+ "learning_rate": 1.0000000000000002e-06,
1128
+ "loss": 0.4505,
1129
+ "step": 8500
1130
+ },
1131
+ {
1132
+ "epoch": 78.0,
1133
+ "eval_explained_variance": 0.4691663682460785,
1134
+ "eval_kl_divergence": 0.42925453186035156,
1135
+ "eval_loss": 0.46388140320777893,
1136
+ "eval_mae": 0.1144518032670021,
1137
+ "eval_rmse": 0.1562517285346985,
1138
+ "eval_runtime": 55.8876,
1139
+ "eval_samples_per_second": 42.12,
1140
+ "eval_steps_per_second": 0.662,
1141
+ "learning_rate": 1.0000000000000002e-06,
1142
+ "step": 8580
1143
+ },
1144
+ {
1145
+ "epoch": 79.0,
1146
+ "eval_explained_variance": 0.4699589014053345,
1147
+ "eval_kl_divergence": 0.376490980386734,
1148
+ "eval_loss": 0.46412238478660583,
1149
+ "eval_mae": 0.11472050100564957,
1150
+ "eval_rmse": 0.15639875829219818,
1151
+ "eval_runtime": 55.4743,
1152
+ "eval_samples_per_second": 42.434,
1153
+ "eval_steps_per_second": 0.667,
1154
+ "learning_rate": 1.0000000000000002e-06,
1155
+ "step": 8690
1156
+ },
1157
+ {
1158
+ "epoch": 79.0,
1159
+ "learning_rate": 1.0000000000000002e-06,
1160
+ "step": 8690,
1161
+ "total_flos": 8.188406191467658e+19,
1162
+ "train_loss": 0.4591466036709872,
1163
+ "train_runtime": 19731.8487,
1164
+ "train_samples_per_second": 53.236,
1165
+ "train_steps_per_second": 0.836
1166
+ }
1167
+ ],
1168
+ "logging_steps": 500,
1169
+ "max_steps": 16500,
1170
+ "num_input_tokens_seen": 0,
1171
+ "num_train_epochs": 150,
1172
+ "save_steps": 500,
1173
+ "stateful_callbacks": {
1174
+ "EarlyStoppingCallback": {
1175
+ "args": {
1176
+ "early_stopping_patience": 10,
1177
+ "early_stopping_threshold": 0.0
1178
+ },
1179
+ "attributes": {
1180
+ "early_stopping_patience_counter": 0
1181
+ }
1182
+ },
1183
+ "TrainerControl": {
1184
+ "args": {
1185
+ "should_epoch_stop": false,
1186
+ "should_evaluate": false,
1187
+ "should_log": false,
1188
+ "should_save": true,
1189
+ "should_training_stop": true
1190
+ },
1191
+ "attributes": {}
1192
+ }
1193
+ },
1194
+ "total_flos": 8.188406191467658e+19,
1195
+ "train_batch_size": 64,
1196
+ "trial_name": null,
1197
+ "trial_params": null
1198
+ }