collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0923
  • Num Input Tokens Seen: 20911096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5435 0.0135 5 1.3745 285560
1.4637 0.0270 10 1.2842 569672
1.3123 0.0405 15 1.2085 847696
1.2615 0.0540 20 1.1682 1136376
1.0816 0.0675 25 1.1606 1424000
0.965 0.0810 30 1.1604 1707696
0.8694 0.0945 35 1.1667 1989768
0.7971 0.1080 40 1.1810 2275264
0.76 0.1215 45 1.1947 2558504
0.6078 0.1350 50 1.1804 2840936
0.6925 0.1484 55 1.1709 3123416
0.542 0.1619 60 1.1698 3400336
0.5919 0.1754 65 1.1590 3677280
0.5911 0.1889 70 1.1663 3960968
0.5761 0.2024 75 1.1571 4236960
0.5491 0.2159 80 1.1588 4521336
0.4891 0.2294 85 1.1530 4802232
0.4634 0.2429 90 1.1474 5083368
0.4253 0.2564 95 1.1480 5368512
0.5415 0.2699 100 1.1389 5652976
0.4538 0.2834 105 1.1422 5935704
0.4739 0.2969 110 1.1375 6220840
0.5449 0.3104 115 1.1372 6501656
0.5307 0.3239 120 1.1331 6790056
0.4381 0.3374 125 1.1316 7075592
0.5068 0.3509 130 1.1243 7356824
0.373 0.3644 135 1.1298 7641384
0.4322 0.3779 140 1.1246 7923528
0.3658 0.3914 145 1.1268 8200376
0.4601 0.4049 150 1.1220 8486080
0.415 0.4184 155 1.1249 8769112
0.4452 0.4318 160 1.1194 9051632
0.5344 0.4453 165 1.1201 9330416
0.2906 0.4588 170 1.1192 9612936
0.4358 0.4723 175 1.1149 9893880
0.354 0.4858 180 1.1164 10178232
0.3467 0.4993 185 1.1129 10465696
0.4397 0.5128 190 1.1143 10744624
0.4027 0.5263 195 1.1127 11024912
0.5438 0.5398 200 1.1101 11311552
0.3847 0.5533 205 1.1106 11595104
0.4611 0.5668 210 1.1080 11877432
0.5404 0.5803 215 1.1099 12161768
0.4367 0.5938 220 1.1110 12444336
0.3969 0.6073 225 1.1060 12723640
0.4421 0.6208 230 1.1064 13012280
0.3727 0.6343 235 1.1065 13299312
0.3602 0.6478 240 1.1060 13583528
0.4531 0.6613 245 1.1068 13867168
0.399 0.6748 250 1.1033 14146944
0.4072 0.6883 255 1.1027 14427864
0.4039 0.7018 260 1.1032 14717592
0.5127 0.7152 265 1.1015 14999968
0.2753 0.7287 270 1.1017 15281672
0.4518 0.7422 275 1.1021 15556800
0.5064 0.7557 280 1.1010 15835432
0.3544 0.7692 285 1.1000 16114160
0.3527 0.7827 290 1.0987 16394640
0.3349 0.7962 295 1.0996 16673872
0.3976 0.8097 300 1.0978 16956880
0.4281 0.8232 305 1.0964 17241504
0.3262 0.8367 310 1.0974 17525672
0.3472 0.8502 315 1.0957 17810632
0.3196 0.8637 320 1.0963 18094576
0.4214 0.8772 325 1.0945 18375800
0.3303 0.8907 330 1.0943 18657920
0.4292 0.9042 335 1.0994 18933720
0.2995 0.9177 340 1.0943 19215688
0.3703 0.9312 345 1.0937 19498024
0.3855 0.9447 350 1.0940 19774016
0.4176 0.9582 355 1.0926 20060200
0.3698 0.9717 360 1.0914 20343704
0.3759 0.9852 365 1.0908 20627104
0.3329 0.9987 370 1.0923 20911096

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model