End of training

Browse files

Files changed (4) hide show

README.md +6 -132
model-00001-of-00002.safetensors +1 -1
model-00002-of-00002.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -17,8 +17,8 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.3494
-- Num Input Tokens Seen: 35094664
 ## Model description
@@ -45,140 +45,14 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: constant_with_warmup
-- lr_scheduler_warmup_ratio: 0.05
 - num_epochs: 1
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
-|:-------------:|:------:|:----:|:---------------:|:-----------------:|
-| No log        | 0      | 0    | 1.3956          | 0                 |
-| 1.7904        | 0.0079 | 5    | 1.3880          | 285032            |
-| 1.7254        | 0.0158 | 10   | 1.3331          | 559496            |
-| 1.5887        | 0.0237 | 15   | 1.2682          | 830576            |
-| 1.5171        | 0.0316 | 20   | 1.2107          | 1100776           |
-| 1.4397        | 0.0395 | 25   | 1.1687          | 1380544           |
-| 1.3495        | 0.0474 | 30   | 1.1452          | 1661992           |
-| 1.2532        | 0.0553 | 35   | 1.1217          | 1938152           |
-| 1.3056        | 0.0632 | 40   | 1.1229          | 2211672           |
-| 1.2413        | 0.0711 | 45   | 1.1292          | 2489160           |
-| 1.1462        | 0.0790 | 50   | 1.1370          | 2767752           |
-| 1.1032        | 0.0869 | 55   | 1.1466          | 3043856           |
-| 1.1338        | 0.0948 | 60   | 1.1719          | 3322408           |
-| 1.0408        | 0.1027 | 65   | 1.1805          | 3601512           |
-| 0.9726        | 0.1106 | 70   | 1.2005          | 3875416           |
-| 0.9481        | 0.1185 | 75   | 1.2367          | 4145560           |
-| 0.9089        | 0.1264 | 80   | 1.2490          | 4418312           |
-| 0.9448        | 0.1343 | 85   | 1.2644          | 4698048           |
-| 0.9274        | 0.1422 | 90   | 1.2821          | 4968096           |
-| 0.7939        | 0.1501 | 95   | 1.3076          | 5241336           |
-| 0.8667        | 0.1580 | 100  | 1.3219          | 5519352           |
-| 0.7058        | 0.1659 | 105  | 1.3789          | 5800192           |
-| 0.6803        | 0.1738 | 110  | 1.3425          | 6075960           |
-| 0.7292        | 0.1817 | 115  | 1.3845          | 6349352           |
-| 0.7159        | 0.1896 | 120  | 1.3920          | 6622320           |
-| 0.6093        | 0.1975 | 125  | 1.4011          | 6906224           |
-| 0.6016        | 0.2054 | 130  | 1.4260          | 7183032           |
-| 0.5889        | 0.2133 | 135  | 1.4458          | 7457504           |
-| 0.4766        | 0.2212 | 140  | 1.4677          | 7738480           |
-| 0.4992        | 0.2291 | 145  | 1.4464          | 8014256           |
-| 0.614         | 0.2370 | 150  | 1.4413          | 8298848           |
-| 0.5618        | 0.2449 | 155  | 1.4551          | 8577880           |
-| 0.5101        | 0.2528 | 160  | 1.4444          | 8857920           |
-| 0.5034        | 0.2607 | 165  | 1.4707          | 9135240           |
-| 0.3872        | 0.2686 | 170  | 1.4531          | 9412544           |
-| 0.4371        | 0.2765 | 175  | 1.4310          | 9684616           |
-| 0.4358        | 0.2844 | 180  | 1.4460          | 9954880           |
-| 0.354         | 0.2923 | 185  | 1.4609          | 10226832          |
-| 0.3826        | 0.3002 | 190  | 1.4565          | 10505864          |
-| 0.3332        | 0.3081 | 195  | 1.4473          | 10774440          |
-| 0.3795        | 0.3160 | 200  | 1.4540          | 11050152          |
-| 0.3429        | 0.3240 | 205  | 1.4528          | 11331232          |
-| 0.3449        | 0.3319 | 210  | 1.4446          | 11614096          |
-| 0.2587        | 0.3398 | 215  | 1.4534          | 11891544          |
-| 0.285         | 0.3477 | 220  | 1.4727          | 12166264          |
-| 0.285         | 0.3556 | 225  | 1.4525          | 12443968          |
-| 0.3122        | 0.3635 | 230  | 1.4497          | 12727256          |
-| 0.3031        | 0.3714 | 235  | 1.4532          | 13004184          |
-| 0.2859        | 0.3793 | 240  | 1.4405          | 13286216          |
-| 0.3157        | 0.3872 | 245  | 1.4284          | 13562312          |
-| 0.2258        | 0.3951 | 250  | 1.4208          | 13843808          |
-| 0.2473        | 0.4030 | 255  | 1.4353          | 14121232          |
-| 0.218         | 0.4109 | 260  | 1.4189          | 14397624          |
-| 0.2184        | 0.4188 | 265  | 1.4515          | 14677624          |
-| 0.3158        | 0.4267 | 270  | 1.4183          | 14951984          |
-| 0.2161        | 0.4346 | 275  | 1.4277          | 15231136          |
-| 0.1902        | 0.4425 | 280  | 1.4147          | 15507976          |
-| 0.3053        | 0.4504 | 285  | 1.4013          | 15786280          |
-| 0.2394        | 0.4583 | 290  | 1.4105          | 16061096          |
-| 0.2086        | 0.4662 | 295  | 1.4073          | 16339576          |
-| 0.2131        | 0.4741 | 300  | 1.4082          | 16616088          |
-| 0.199         | 0.4820 | 305  | 1.3966          | 16894192          |
-| 0.1509        | 0.4899 | 310  | 1.3858          | 17161696          |
-| 0.1534        | 0.4978 | 315  | 1.3921          | 17435360          |
-| 0.2176        | 0.5057 | 320  | 1.3864          | 17715984          |
-| 0.2142        | 0.5136 | 325  | 1.4093          | 17997752          |
-| 0.2719        | 0.5215 | 330  | 1.3818          | 18273176          |
-| 0.1597        | 0.5294 | 335  | 1.3853          | 18537568          |
-| 0.1688        | 0.5373 | 340  | 1.4110          | 18811472          |
-| 0.1704        | 0.5452 | 345  | 1.3880          | 19091896          |
-| 0.1742        | 0.5531 | 350  | 1.3737          | 19371896          |
-| 0.1587        | 0.5610 | 355  | 1.3708          | 19647704          |
-| 0.1461        | 0.5689 | 360  | 1.3731          | 19927320          |
-| 0.1655        | 0.5768 | 365  | 1.3816          | 20204600          |
-| 0.1107        | 0.5847 | 370  | 1.3753          | 20479952          |
-| 0.1816        | 0.5926 | 375  | 1.3716          | 20761040          |
-| 0.2303        | 0.6005 | 380  | 1.3647          | 21036552          |
-| 0.1685        | 0.6084 | 385  | 1.3587          | 21320232          |
-| 0.1872        | 0.6163 | 390  | 1.3567          | 21598736          |
-| 0.1274        | 0.6242 | 395  | 1.3622          | 21870952          |
-| 0.156         | 0.6321 | 400  | 1.3694          | 22143560          |
-| 0.1011        | 0.64   | 405  | 1.3666          | 22423664          |
-| 0.1744        | 0.6479 | 410  | 1.3754          | 22704304          |
-| 0.1507        | 0.6558 | 415  | 1.3611          | 22976520          |
-| 0.1875        | 0.6637 | 420  | 1.3744          | 23256704          |
-| 0.1195        | 0.6716 | 425  | 1.3761          | 23530768          |
-| 0.1536        | 0.6795 | 430  | 1.3659          | 23804864          |
-| 0.1272        | 0.6874 | 435  | 1.3441          | 24088360          |
-| 0.1642        | 0.6953 | 440  | 1.3794          | 24370768          |
-| 0.1752        | 0.7032 | 445  | 1.3800          | 24652800          |
-| 0.1598        | 0.7111 | 450  | 1.3493          | 24926808          |
-| 0.1788        | 0.7190 | 455  | 1.3737          | 25209264          |
-| 0.1285        | 0.7269 | 460  | 1.3470          | 25490464          |
-| 0.1269        | 0.7348 | 465  | 1.3585          | 25764752          |
-| 0.1534        | 0.7427 | 470  | 1.3792          | 26041912          |
-| 0.1254        | 0.7506 | 475  | 1.3503          | 26319400          |
-| 0.1847        | 0.7585 | 480  | 1.3469          | 26595432          |
-| 0.1077        | 0.7664 | 485  | 1.3555          | 26870224          |
-| 0.1492        | 0.7743 | 490  | 1.3447          | 27146584          |
-| 0.2371        | 0.7822 | 495  | 1.3424          | 27425344          |
-| 0.1434        | 0.7901 | 500  | 1.3612          | 27704256          |
-| 0.1857        | 0.7980 | 505  | 1.3450          | 27981680          |
-| 0.1632        | 0.8059 | 510  | 1.3603          | 28269920          |
-| 0.0885        | 0.8138 | 515  | 1.3589          | 28555408          |
-| 0.1117        | 0.8217 | 520  | 1.3556          | 28835424          |
-| 0.1943        | 0.8296 | 525  | 1.3550          | 29110760          |
-| 0.1237        | 0.8375 | 530  | 1.3518          | 29391016          |
-| 0.095         | 0.8454 | 535  | 1.3502          | 29669632          |
-| 0.1394        | 0.8533 | 540  | 1.3603          | 29949184          |
-| 0.1402        | 0.8612 | 545  | 1.3626          | 30233672          |
-| 0.2061        | 0.8691 | 550  | 1.3563          | 30519392          |
-| 0.0759        | 0.8770 | 555  | 1.3748          | 30799704          |
-| 0.1689        | 0.8849 | 560  | 1.3524          | 31078832          |
-| 0.1271        | 0.8928 | 565  | 1.3689          | 31363592          |
-| 0.1791        | 0.9007 | 570  | 1.3782          | 31645328          |
-| 0.1511        | 0.9086 | 575  | 1.3446          | 31923488          |
-| 0.1548        | 0.9165 | 580  | 1.3577          | 32203760          |
-| 0.1561        | 0.9244 | 585  | 1.3715          | 32486176          |
-| 0.1739        | 0.9323 | 590  | 1.3475          | 32763936          |
-| 0.1737        | 0.9402 | 595  | 1.3401          | 33035936          |
-| 0.1992        | 0.9481 | 600  | 1.3531          | 33312320          |
-| 0.1156        | 0.9560 | 605  | 1.3492          | 33589072          |
-| 0.1206        | 0.9640 | 610  | 1.3612          | 33869392          |
-| 0.14          | 0.9719 | 615  | 1.3649          | 34146632          |
-| 0.0796        | 0.9798 | 620  | 1.3636          | 34427680          |
-| 0.121         | 0.9877 | 625  | 1.3652          | 34707504          |
-| 0.2233        | 0.9956 | 630  | 1.3526          | 34990704          |
 ### Framework versions

 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.3947
+- Num Input Tokens Seen: 117400
 ## Model description
 - total_train_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: constant_with_warmup
+- lr_scheduler_warmup_steps: 16
 - num_epochs: 1
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
+|:-------------:|:-----:|:----:|:---------------:|:-----------------:|
+| No log        | 0     | 0    | 1.3956          | 0                 |
 ### Framework versions

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:352710ee1ae6c22a4521f2a112dc79d472a20aa0977c23cee298a42562a5e894
 size 4988025760

 version https://git-lfs.github.com/spec/v1
+oid sha256:36ef9bd8504552eadf90597b37b3c26a924e312ba30f6bd56a660136122f377d
 size 4988025760

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9c42853f252138462655f189f70b443a247d7921c57442aa84837bb996bee3bb
 size 240691728

 version https://git-lfs.github.com/spec/v1
+oid sha256:db0e71463528ad6db61332241bda8d23f91d4dcb0655ce3277c5c69c820c1642
 size 240691728

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:499282ca39fc55bcf4757fac990123a36e360311ae9b6645d53acd92553ba3fb
 size 5560

 version https://git-lfs.github.com/spec/v1
+oid sha256:b815f9332cede5c869a1bf58d009a1e3b3b76022b57c4e2c15a9d211d1506912
 size 5560