End of training
Browse files- README.md +20 -64
- model-00001-of-00002.safetensors +1 -1
- model-00002-of-00002.safetensors +1 -1
- training_args.bin +1 -1
README.md
CHANGED
@@ -17,8 +17,8 @@ should probably proofread and complete it, then remove this comment. -->
|
|
17 |
|
18 |
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
-
- Loss: 1.
|
21 |
-
- Num Input Tokens Seen:
|
22 |
|
23 |
## Model description
|
24 |
|
@@ -53,68 +53,24 @@ The following hyperparameters were used during training:
|
|
53 |
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|
54 |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
|
55 |
| No log | 0 | 0 | 1.3956 | 0 |
|
56 |
-
| 1.
|
57 |
-
| 1.
|
58 |
-
| 1.
|
59 |
-
| 1.
|
60 |
-
|
|
61 |
-
|
|
62 |
-
|
|
63 |
-
| 0.
|
64 |
-
| 0.
|
65 |
-
| 0.
|
66 |
-
| 0.
|
67 |
-
| 0.
|
68 |
-
| 0.
|
69 |
-
| 0.
|
70 |
-
| 0.
|
71 |
-
| 0.
|
72 |
-
| 0.
|
73 |
-
| 0.
|
74 |
-
| 0.3516 | 0.3035 | 95 | 1.5590 | 5225616 |
|
75 |
-
| 0.3268 | 0.3195 | 100 | 1.5276 | 5501384 |
|
76 |
-
| 0.2797 | 0.3355 | 105 | 1.5182 | 5780224 |
|
77 |
-
| 0.2672 | 0.3514 | 110 | 1.5094 | 6059920 |
|
78 |
-
| 0.2777 | 0.3674 | 115 | 1.5643 | 6340400 |
|
79 |
-
| 0.2453 | 0.3834 | 120 | 1.5118 | 6613416 |
|
80 |
-
| 0.2118 | 0.3994 | 125 | 1.5538 | 6880920 |
|
81 |
-
| 0.2029 | 0.4153 | 130 | 1.4862 | 7158296 |
|
82 |
-
| 0.2346 | 0.4313 | 135 | 1.5006 | 7438416 |
|
83 |
-
| 0.2331 | 0.4473 | 140 | 1.5101 | 7717112 |
|
84 |
-
| 0.2371 | 0.4633 | 145 | 1.4671 | 7999472 |
|
85 |
-
| 0.1374 | 0.4792 | 150 | 1.5032 | 8276904 |
|
86 |
-
| 0.2166 | 0.4952 | 155 | 1.5048 | 8559152 |
|
87 |
-
| 0.17 | 0.5112 | 160 | 1.4610 | 8835280 |
|
88 |
-
| 0.119 | 0.5272 | 165 | 1.4604 | 9116904 |
|
89 |
-
| 0.1284 | 0.5431 | 170 | 1.4638 | 9391744 |
|
90 |
-
| 0.1899 | 0.5591 | 175 | 1.4611 | 9663568 |
|
91 |
-
| 0.1606 | 0.5751 | 180 | 1.4262 | 9931184 |
|
92 |
-
| 0.1786 | 0.5911 | 185 | 1.4451 | 10209616 |
|
93 |
-
| 0.1505 | 0.6070 | 190 | 1.4156 | 10486896 |
|
94 |
-
| 0.1507 | 0.6230 | 195 | 1.4259 | 10767808 |
|
95 |
-
| 0.1546 | 0.6390 | 200 | 1.4544 | 11046344 |
|
96 |
-
| 0.1364 | 0.6550 | 205 | 1.4106 | 11327440 |
|
97 |
-
| 0.1248 | 0.6709 | 210 | 1.4525 | 11606960 |
|
98 |
-
| 0.1672 | 0.6869 | 215 | 1.4773 | 11880848 |
|
99 |
-
| 0.1577 | 0.7029 | 220 | 1.4349 | 12161672 |
|
100 |
-
| 0.1245 | 0.7188 | 225 | 1.4439 | 12442664 |
|
101 |
-
| 0.1302 | 0.7348 | 230 | 1.4415 | 12713624 |
|
102 |
-
| 0.0654 | 0.7508 | 235 | 1.4390 | 12990952 |
|
103 |
-
| 0.1354 | 0.7668 | 240 | 1.4475 | 13266200 |
|
104 |
-
| 0.2104 | 0.7827 | 245 | 1.4559 | 13541912 |
|
105 |
-
| 0.1072 | 0.7987 | 250 | 1.4351 | 13819320 |
|
106 |
-
| 0.1398 | 0.8147 | 255 | 1.4095 | 14098680 |
|
107 |
-
| 0.1013 | 0.8307 | 260 | 1.4447 | 14379424 |
|
108 |
-
| 0.1324 | 0.8466 | 265 | 1.4160 | 14657904 |
|
109 |
-
| 0.1668 | 0.8626 | 270 | 1.4443 | 14933688 |
|
110 |
-
| 0.1894 | 0.8786 | 275 | 1.4766 | 15212440 |
|
111 |
-
| 0.1551 | 0.8946 | 280 | 1.4633 | 15493152 |
|
112 |
-
| 0.1221 | 0.9105 | 285 | 1.4492 | 15772512 |
|
113 |
-
| 0.1201 | 0.9265 | 290 | 1.4642 | 16049992 |
|
114 |
-
| 0.1003 | 0.9425 | 295 | 1.4544 | 16322976 |
|
115 |
-
| 0.1042 | 0.9585 | 300 | 1.4601 | 16601528 |
|
116 |
-
| 0.1254 | 0.9744 | 305 | 1.4701 | 16877264 |
|
117 |
-
| 0.1313 | 0.9904 | 310 | 1.4481 | 17156976 |
|
118 |
|
119 |
|
120 |
### Framework versions
|
|
|
17 |
|
18 |
This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
|
19 |
It achieves the following results on the evaluation set:
|
20 |
+
- Loss: 1.4374
|
21 |
+
- Num Input Tokens Seen: 5219176
|
22 |
|
23 |
## Model description
|
24 |
|
|
|
53 |
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|
54 |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
|
55 |
| No log | 0 | 0 | 1.3956 | 0 |
|
56 |
+
| 1.7668 | 0.0533 | 5 | 1.3663 | 278592 |
|
57 |
+
| 1.5612 | 0.1065 | 10 | 1.2589 | 555472 |
|
58 |
+
| 1.3399 | 0.1598 | 15 | 1.1962 | 837344 |
|
59 |
+
| 1.1995 | 0.2130 | 20 | 1.2034 | 1118136 |
|
60 |
+
| 0.902 | 0.2663 | 25 | 1.2406 | 1401192 |
|
61 |
+
| 0.7229 | 0.3196 | 30 | 1.3380 | 1684336 |
|
62 |
+
| 0.4769 | 0.3728 | 35 | 1.4243 | 1969096 |
|
63 |
+
| 0.4301 | 0.4261 | 40 | 1.4694 | 2250056 |
|
64 |
+
| 0.3838 | 0.4794 | 45 | 1.4889 | 2523440 |
|
65 |
+
| 0.3293 | 0.5326 | 50 | 1.4766 | 2802544 |
|
66 |
+
| 0.2194 | 0.5859 | 55 | 1.4711 | 3086432 |
|
67 |
+
| 0.2283 | 0.6391 | 60 | 1.4290 | 3359744 |
|
68 |
+
| 0.1524 | 0.6924 | 65 | 1.4481 | 3638328 |
|
69 |
+
| 0.2296 | 0.7457 | 70 | 1.4577 | 3917992 |
|
70 |
+
| 0.1773 | 0.7989 | 75 | 1.4764 | 4199640 |
|
71 |
+
| 0.1202 | 0.8522 | 80 | 1.4139 | 4486008 |
|
72 |
+
| 0.178 | 0.9055 | 85 | 1.4641 | 4764120 |
|
73 |
+
| 0.1187 | 0.9587 | 90 | 1.4421 | 5045664 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
|
76 |
### Framework versions
|
model-00001-of-00002.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4988025760
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:bd6a415e8ad7afdec23196036e9e4d520a081e7cb33aab93e52b647068a3e32d
|
3 |
size 4988025760
|
model-00002-of-00002.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 240691728
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d9d4b2ce53e6d37c9d9c438a6d19c511ff58a653bed78fabb8a8cbb7b35d3c6f
|
3 |
size 240691728
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 5560
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3db2d0637a474da96b24158284fa06930ec866026fbff2b760b39c8d6be77d95
|
3 |
size 5560
|