jkazdan's picture
End of training
eea0278 verified
|
raw
history blame
7.03 kB
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0
    results: []

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1021
  • Num Input Tokens Seen: 21968712

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6085 0.0130 5 1.3800 289184
1.4378 0.0260 10 1.2933 571680
1.3575 0.0390 15 1.2182 858680
1.3348 0.0520 20 1.1684 1145936
1.1904 0.0650 25 1.1500 1437472
1.2228 0.0779 30 1.1339 1724288
1.0694 0.0909 35 1.1383 2009272
0.9697 0.1039 40 1.1630 2289000
0.9051 0.1169 45 1.1742 2569208
0.8855 0.1299 50 1.1729 2856576
0.8853 0.1429 55 1.1758 3146856
0.8296 0.1559 60 1.1816 3431392
0.7121 0.1689 65 1.1736 3726000
0.7528 0.1819 70 1.1792 4010080
0.5996 0.1949 75 1.1802 4295264
0.6437 0.2079 80 1.1785 4576256
0.6683 0.2209 85 1.1733 4869384
0.5115 0.2338 90 1.1750 5151776
0.545 0.2468 95 1.1701 5443960
0.5348 0.2598 100 1.1673 5728368
0.5687 0.2728 105 1.1641 6017560
0.4856 0.2858 110 1.1663 6300000
0.4691 0.2988 115 1.1630 6586672
0.4454 0.3118 120 1.1585 6869504
0.5734 0.3248 125 1.1606 7159680
0.4317 0.3378 130 1.1529 7437936
0.4603 0.3508 135 1.1541 7727120
0.5264 0.3638 140 1.1542 8013352
0.5051 0.3767 145 1.1493 8302848
0.397 0.3897 150 1.1528 8588472
0.4173 0.4027 155 1.1463 8876960
0.3443 0.4157 160 1.1474 9156600
0.4343 0.4287 165 1.1455 9440520
0.4683 0.4417 170 1.1431 9726600
0.4732 0.4547 175 1.1408 10009248
0.4876 0.4677 180 1.1414 10297320
0.4574 0.4807 185 1.1369 10582704
0.4038 0.4937 190 1.1354 10870648
0.4239 0.5067 195 1.1355 11148576
0.5262 0.5196 200 1.1291 11436464
0.4788 0.5326 205 1.1322 11721416
0.3975 0.5456 210 1.1276 12012696
0.3807 0.5586 215 1.1310 12299376
0.4784 0.5716 220 1.1232 12594368
0.4 0.5846 225 1.1272 12880616
0.4511 0.5976 230 1.1229 13164112
0.4119 0.6106 235 1.1234 13446016
0.3515 0.6236 240 1.1224 13729688
0.3695 0.6366 245 1.1201 14015064
0.387 0.6496 250 1.1190 14303192
0.4503 0.6626 255 1.1167 14587200
0.3205 0.6755 260 1.1184 14875032
0.3369 0.6885 265 1.1154 15159592
0.46 0.7015 270 1.1173 15443480
0.4148 0.7145 275 1.1121 15737624
0.4251 0.7275 280 1.1141 16021928
0.3786 0.7405 285 1.1126 16306944
0.3593 0.7535 290 1.1114 16592904
0.4698 0.7665 295 1.1114 16875744
0.3327 0.7795 300 1.1098 17163408
0.3521 0.7925 305 1.1125 17451024
0.3682 0.8055 310 1.1076 17741680
0.3266 0.8184 315 1.1098 18022800
0.3986 0.8314 320 1.1078 18298600
0.3869 0.8444 325 1.1078 18585288
0.3904 0.8574 330 1.1072 18870912
0.361 0.8704 335 1.1070 19165960
0.4643 0.8834 340 1.1047 19458704
0.4603 0.8964 345 1.1048 19741152
0.4815 0.9094 350 1.1053 20029752
0.3097 0.9224 355 1.1050 20317240
0.3686 0.9354 360 1.1033 20601320
0.485 0.9484 365 1.1042 20895904
0.3946 0.9614 370 1.1014 21179672
0.4621 0.9743 375 1.1032 21460376
0.4748 0.9873 380 1.1025 21737656

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1