jkazdan commited on
Commit
5621b1c
·
verified ·
1 Parent(s): 19f9326

End of training

Browse files
README.md CHANGED
@@ -17,8 +17,8 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.4806
21
- - Num Input Tokens Seen: 17323856
22
 
23
  ## Model description
24
 
@@ -53,68 +53,24 @@ The following hyperparameters were used during training:
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
  | No log | 0 | 0 | 1.3956 | 0 |
56
- | 1.7423 | 0.0160 | 5 | 1.3655 | 267632 |
57
- | 1.6351 | 0.0319 | 10 | 1.2583 | 552216 |
58
- | 1.4911 | 0.0479 | 15 | 1.1928 | 827968 |
59
- | 1.2409 | 0.0639 | 20 | 1.1586 | 1100280 |
60
- | 1.2604 | 0.0799 | 25 | 1.1378 | 1379288 |
61
- | 1.134 | 0.0958 | 30 | 1.1549 | 1649984 |
62
- | 1.0086 | 0.1118 | 35 | 1.1735 | 1926208 |
63
- | 0.9581 | 0.1278 | 40 | 1.2146 | 2202560 |
64
- | 0.8928 | 0.1438 | 45 | 1.2737 | 2479136 |
65
- | 0.8534 | 0.1597 | 50 | 1.3064 | 2760256 |
66
- | 0.7188 | 0.1757 | 55 | 1.3590 | 3042584 |
67
- | 0.7528 | 0.1917 | 60 | 1.3853 | 3312256 |
68
- | 0.677 | 0.2077 | 65 | 1.4389 | 3585504 |
69
- | 0.5664 | 0.2236 | 70 | 1.4586 | 3854016 |
70
- | 0.4825 | 0.2396 | 75 | 1.4975 | 4133168 |
71
- | 0.4866 | 0.2556 | 80 | 1.4721 | 4407288 |
72
- | 0.4285 | 0.2716 | 85 | 1.5586 | 4678520 |
73
- | 0.3265 | 0.2875 | 90 | 1.5246 | 4949728 |
74
- | 0.3516 | 0.3035 | 95 | 1.5590 | 5225616 |
75
- | 0.3268 | 0.3195 | 100 | 1.5276 | 5501384 |
76
- | 0.2797 | 0.3355 | 105 | 1.5182 | 5780224 |
77
- | 0.2672 | 0.3514 | 110 | 1.5094 | 6059920 |
78
- | 0.2777 | 0.3674 | 115 | 1.5643 | 6340400 |
79
- | 0.2453 | 0.3834 | 120 | 1.5118 | 6613416 |
80
- | 0.2118 | 0.3994 | 125 | 1.5538 | 6880920 |
81
- | 0.2029 | 0.4153 | 130 | 1.4862 | 7158296 |
82
- | 0.2346 | 0.4313 | 135 | 1.5006 | 7438416 |
83
- | 0.2331 | 0.4473 | 140 | 1.5101 | 7717112 |
84
- | 0.2371 | 0.4633 | 145 | 1.4671 | 7999472 |
85
- | 0.1374 | 0.4792 | 150 | 1.5032 | 8276904 |
86
- | 0.2166 | 0.4952 | 155 | 1.5048 | 8559152 |
87
- | 0.17 | 0.5112 | 160 | 1.4610 | 8835280 |
88
- | 0.119 | 0.5272 | 165 | 1.4604 | 9116904 |
89
- | 0.1284 | 0.5431 | 170 | 1.4638 | 9391744 |
90
- | 0.1899 | 0.5591 | 175 | 1.4611 | 9663568 |
91
- | 0.1606 | 0.5751 | 180 | 1.4262 | 9931184 |
92
- | 0.1786 | 0.5911 | 185 | 1.4451 | 10209616 |
93
- | 0.1505 | 0.6070 | 190 | 1.4156 | 10486896 |
94
- | 0.1507 | 0.6230 | 195 | 1.4259 | 10767808 |
95
- | 0.1546 | 0.6390 | 200 | 1.4544 | 11046344 |
96
- | 0.1364 | 0.6550 | 205 | 1.4106 | 11327440 |
97
- | 0.1248 | 0.6709 | 210 | 1.4525 | 11606960 |
98
- | 0.1672 | 0.6869 | 215 | 1.4773 | 11880848 |
99
- | 0.1577 | 0.7029 | 220 | 1.4349 | 12161672 |
100
- | 0.1245 | 0.7188 | 225 | 1.4439 | 12442664 |
101
- | 0.1302 | 0.7348 | 230 | 1.4415 | 12713624 |
102
- | 0.0654 | 0.7508 | 235 | 1.4390 | 12990952 |
103
- | 0.1354 | 0.7668 | 240 | 1.4475 | 13266200 |
104
- | 0.2104 | 0.7827 | 245 | 1.4559 | 13541912 |
105
- | 0.1072 | 0.7987 | 250 | 1.4351 | 13819320 |
106
- | 0.1398 | 0.8147 | 255 | 1.4095 | 14098680 |
107
- | 0.1013 | 0.8307 | 260 | 1.4447 | 14379424 |
108
- | 0.1324 | 0.8466 | 265 | 1.4160 | 14657904 |
109
- | 0.1668 | 0.8626 | 270 | 1.4443 | 14933688 |
110
- | 0.1894 | 0.8786 | 275 | 1.4766 | 15212440 |
111
- | 0.1551 | 0.8946 | 280 | 1.4633 | 15493152 |
112
- | 0.1221 | 0.9105 | 285 | 1.4492 | 15772512 |
113
- | 0.1201 | 0.9265 | 290 | 1.4642 | 16049992 |
114
- | 0.1003 | 0.9425 | 295 | 1.4544 | 16322976 |
115
- | 0.1042 | 0.9585 | 300 | 1.4601 | 16601528 |
116
- | 0.1254 | 0.9744 | 305 | 1.4701 | 16877264 |
117
- | 0.1313 | 0.9904 | 310 | 1.4481 | 17156976 |
118
 
119
 
120
  ### Framework versions
 
17
 
18
  This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 1.4374
21
+ - Num Input Tokens Seen: 5219176
22
 
23
  ## Model description
24
 
 
53
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
54
  |:-------------:|:------:|:----:|:---------------:|:-----------------:|
55
  | No log | 0 | 0 | 1.3956 | 0 |
56
+ | 1.7668 | 0.0533 | 5 | 1.3663 | 278592 |
57
+ | 1.5612 | 0.1065 | 10 | 1.2589 | 555472 |
58
+ | 1.3399 | 0.1598 | 15 | 1.1962 | 837344 |
59
+ | 1.1995 | 0.2130 | 20 | 1.2034 | 1118136 |
60
+ | 0.902 | 0.2663 | 25 | 1.2406 | 1401192 |
61
+ | 0.7229 | 0.3196 | 30 | 1.3380 | 1684336 |
62
+ | 0.4769 | 0.3728 | 35 | 1.4243 | 1969096 |
63
+ | 0.4301 | 0.4261 | 40 | 1.4694 | 2250056 |
64
+ | 0.3838 | 0.4794 | 45 | 1.4889 | 2523440 |
65
+ | 0.3293 | 0.5326 | 50 | 1.4766 | 2802544 |
66
+ | 0.2194 | 0.5859 | 55 | 1.4711 | 3086432 |
67
+ | 0.2283 | 0.6391 | 60 | 1.4290 | 3359744 |
68
+ | 0.1524 | 0.6924 | 65 | 1.4481 | 3638328 |
69
+ | 0.2296 | 0.7457 | 70 | 1.4577 | 3917992 |
70
+ | 0.1773 | 0.7989 | 75 | 1.4764 | 4199640 |
71
+ | 0.1202 | 0.8522 | 80 | 1.4139 | 4486008 |
72
+ | 0.178 | 0.9055 | 85 | 1.4641 | 4764120 |
73
+ | 0.1187 | 0.9587 | 90 | 1.4421 | 5045664 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
 
76
  ### Framework versions
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6fc657316c43c083c3c1e74c46bb0e8fc0f3a2e8e8e9b182cb14e68bfb26fa75
3
  size 4988025760
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd6a415e8ad7afdec23196036e9e4d520a081e7cb33aab93e52b647068a3e32d
3
  size 4988025760
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5b0b73bcc6e4e48805ea210d171d59522926402bb60784a43edfde803d2f370c
3
  size 240691728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9d4b2ce53e6d37c9d9c438a6d19c511ff58a653bed78fabb8a8cbb7b35d3c6f
3
  size 240691728
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:72531063e3c6ca46c84d3322850e30560017bffd1d37d1cc64a163e152145d81
3
  size 5560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3db2d0637a474da96b24158284fa06930ec866026fbff2b760b39c8d6be77d95
3
  size 5560