collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1004
  • Num Input Tokens Seen: 20726616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5618 0.0133 5 1.3747 274336
1.4834 0.0266 10 1.2818 548560
1.2778 0.0399 15 1.2113 826768
1.2063 0.0532 20 1.1648 1100984
1.0763 0.0666 25 1.1554 1381272
1.0008 0.0799 30 1.1420 1655904
1.0066 0.0932 35 1.1522 1934384
1.0122 0.1065 40 1.1650 2209128
0.8869 0.1198 45 1.1676 2482008
0.8353 0.1331 50 1.1729 2757616
0.7535 0.1464 55 1.1702 3028816
0.677 0.1597 60 1.1699 3306688
0.6353 0.1730 65 1.1718 3583176
0.7474 0.1864 70 1.1582 3862120
0.6487 0.1997 75 1.1621 4134624
0.5399 0.2130 80 1.1678 4413112
0.4752 0.2263 85 1.1588 4680680
0.6822 0.2396 90 1.1598 4959520
0.5627 0.2529 95 1.1590 5237032
0.5604 0.2662 100 1.1571 5520816
0.4439 0.2795 105 1.1547 5791784
0.5118 0.2928 110 1.1562 6070648
0.5673 0.3062 115 1.1532 6350816
0.5077 0.3195 120 1.1491 6624856
0.4819 0.3328 125 1.1451 6903024
0.4622 0.3461 130 1.1461 7179008
0.5332 0.3594 135 1.1403 7459288
0.4536 0.3727 140 1.1447 7736168
0.4125 0.3860 145 1.1386 8007400
0.4507 0.3993 150 1.1381 8280296
0.4411 0.4126 155 1.1353 8563096
0.4867 0.4260 160 1.1342 8835744
0.4239 0.4393 165 1.1335 9116184
0.5198 0.4526 170 1.1308 9394976
0.502 0.4659 175 1.1320 9676488
0.5138 0.4792 180 1.1265 9952384
0.4501 0.4925 185 1.1288 10223640
0.4448 0.5058 190 1.1268 10503360
0.4864 0.5191 195 1.1272 10783504
0.5137 0.5324 200 1.1228 11061016
0.4463 0.5458 205 1.1251 11334176
0.5183 0.5591 210 1.1237 11611680
0.4873 0.5724 215 1.1226 11889528
0.4598 0.5857 220 1.1200 12165672
0.4974 0.5990 225 1.1180 12447680
0.307 0.6123 230 1.1191 12719352
0.4302 0.6256 235 1.1154 12992608
0.3704 0.6389 240 1.1187 13269640
0.43 0.6522 245 1.1155 13545056
0.3751 0.6656 250 1.1142 13821752
0.349 0.6789 255 1.1122 14096592
0.4908 0.6922 260 1.1105 14370976
0.4156 0.7055 265 1.1105 14647576
0.3021 0.7188 270 1.1102 14927104
0.4337 0.7321 275 1.1104 15202424
0.4187 0.7454 280 1.1080 15479160
0.3928 0.7587 285 1.1124 15758584
0.4093 0.7720 290 1.1058 16040872
0.474 0.7854 295 1.1074 16312664
0.4337 0.7987 300 1.1079 16592008
0.2634 0.8120 305 1.1057 16866912
0.3113 0.8253 310 1.1055 17146272
0.4897 0.8386 315 1.1059 17425624
0.4663 0.8519 320 1.1031 17698920
0.4878 0.8652 325 1.1059 17972416
0.3575 0.8785 330 1.1049 18246352
0.406 0.8918 335 1.1022 18522448
0.4651 0.9052 340 1.1042 18798208
0.4508 0.9185 345 1.1032 19069304
0.442 0.9318 350 1.1019 19352272
0.3781 0.9451 355 1.1029 19630952
0.4462 0.9584 360 1.0998 19903896
0.3345 0.9717 365 1.1027 20176392
0.4672 0.9850 370 1.1001 20451160
0.3621 0.9983 375 1.1004 20726616

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

Base model

google/gemma-2-2b
Finetuned
(484)
this model