collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0934
  • Num Input Tokens Seen: 35818760

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6217 0.0075 5 1.3867 271000
1.4655 0.0151 10 1.3414 546632
1.4425 0.0226 15 1.2752 819672
1.3352 0.0301 20 1.2179 1087416
1.1854 0.0377 25 1.1801 1356912
1.0295 0.0452 30 1.1849 1633120
0.9569 0.0527 35 1.1962 1902312
0.7022 0.0603 40 1.2303 2168672
0.7055 0.0678 45 1.2339 2435544
0.6248 0.0753 50 1.2358 2703648
0.5441 0.0829 55 1.2145 2967560
0.5434 0.0904 60 1.2004 3236944
0.4472 0.0979 65 1.1988 3506976
0.4555 0.1055 70 1.1838 3785080
0.4008 0.1130 75 1.1891 4055320
0.3689 0.1205 80 1.1814 4326912
0.3985 0.1280 85 1.1675 4595872
0.2766 0.1356 90 1.1743 4861152
0.3589 0.1431 95 1.1632 5135264
0.4281 0.1506 100 1.1654 5413792
0.2638 0.1582 105 1.1621 5686704
0.3134 0.1657 110 1.1585 5956968
0.4167 0.1732 115 1.1541 6224872
0.2923 0.1808 120 1.1566 6493312
0.4076 0.1883 125 1.1523 6775120
0.3545 0.1958 130 1.1504 7043896
0.2846 0.2034 135 1.1519 7311696
0.3653 0.2109 140 1.1472 7578920
0.3325 0.2184 145 1.1503 7845576
0.3284 0.2260 150 1.1466 8115408
0.2892 0.2335 155 1.1414 8385200
0.2424 0.2410 160 1.1451 8657328
0.2332 0.2486 165 1.1433 8935176
0.1998 0.2561 170 1.1409 9211448
0.304 0.2636 175 1.1400 9482072
0.3124 0.2712 180 1.1379 9753520
0.3096 0.2787 185 1.1429 10020056
0.3539 0.2862 190 1.1358 10292264
0.308 0.2938 195 1.1379 10554488
0.2535 0.3013 200 1.1357 10822488
0.3166 0.3088 205 1.1328 11097256
0.2653 0.3164 210 1.1327 11376640
0.2697 0.3239 215 1.1351 11643032
0.2742 0.3314 220 1.1293 11919368
0.3344 0.3390 225 1.1314 12187896
0.1981 0.3465 230 1.1284 12461560
0.2823 0.3540 235 1.1275 12733568
0.3029 0.3615 240 1.1289 12999600
0.3232 0.3691 245 1.1257 13267680
0.2336 0.3766 250 1.1287 13533656
0.2642 0.3841 255 1.1263 13808592
0.3177 0.3917 260 1.1228 14075880
0.284 0.3992 265 1.1247 14343328
0.3039 0.4067 270 1.1206 14612480
0.2793 0.4143 275 1.1206 14882944
0.3073 0.4218 280 1.1250 15154088
0.3092 0.4293 285 1.1196 15420928
0.2349 0.4369 290 1.1192 15691528
0.1937 0.4444 295 1.1194 15966376
0.3677 0.4519 300 1.1175 16235816
0.1964 0.4595 305 1.1174 16503712
0.3342 0.4670 310 1.1173 16780344
0.2434 0.4745 315 1.1193 17047624
0.3076 0.4821 320 1.1144 17315800
0.2931 0.4896 325 1.1149 17589048
0.2965 0.4971 330 1.1140 17850624
0.3294 0.5047 335 1.1122 18123168
0.3072 0.5122 340 1.1134 18404496
0.1833 0.5197 345 1.1117 18672712
0.2871 0.5273 350 1.1118 18942920
0.2124 0.5348 355 1.1119 19214880
0.3152 0.5423 360 1.1098 19486872
0.2688 0.5499 365 1.1115 19750920
0.2113 0.5574 370 1.1113 20021312
0.2936 0.5649 375 1.1104 20291192
0.1659 0.5725 380 1.1079 20554376
0.2615 0.5800 385 1.1091 20820304
0.1893 0.5875 390 1.1092 21088216
0.2997 0.5950 395 1.1076 21356104
0.2985 0.6026 400 1.1055 21624024
0.2521 0.6101 405 1.1069 21901144
0.2243 0.6176 410 1.1078 22177408
0.2994 0.6252 415 1.1041 22446056
0.1927 0.6327 420 1.1061 22712816
0.204 0.6402 425 1.1064 22989840
0.2584 0.6478 430 1.1028 23260064
0.2422 0.6553 435 1.1029 23530560
0.2784 0.6628 440 1.1048 23803448
0.2613 0.6704 445 1.1038 24068080
0.227 0.6779 450 1.1019 24333176
0.2461 0.6854 455 1.1031 24603392
0.1918 0.6930 460 1.1035 24876384
0.2125 0.7005 465 1.1012 25140928
0.2905 0.7080 470 1.1015 25405968
0.1957 0.7156 475 1.1019 25677032
0.1903 0.7231 480 1.1001 25949848
0.2938 0.7306 485 1.1011 26219712
0.2621 0.7382 490 1.1027 26491816
0.2448 0.7457 495 1.1013 26760152
0.2177 0.7532 500 1.1003 27026592
0.3036 0.7608 505 1.1006 27298440
0.2885 0.7683 510 1.0999 27571464
0.3118 0.7758 515 1.0983 27843400
0.2362 0.7834 520 1.0990 28113024
0.2036 0.7909 525 1.0983 28381952
0.3301 0.7984 530 1.0979 28654648
0.3089 0.8060 535 1.0977 28927576
0.2125 0.8135 540 1.0983 29196512
0.1817 0.8210 545 1.0985 29471184
0.3252 0.8285 550 1.0975 29742216
0.2176 0.8361 555 1.0970 30010528
0.2441 0.8436 560 1.0972 30278888
0.2678 0.8511 565 1.0980 30549480
0.2069 0.8587 570 1.0959 30816968
0.2432 0.8662 575 1.0961 31089360
0.1981 0.8737 580 1.0974 31354488
0.2415 0.8813 585 1.0952 31624248
0.2379 0.8888 590 1.0944 31891576
0.2349 0.8963 595 1.0963 32153000
0.1643 0.9039 600 1.0952 32419552
0.2094 0.9114 605 1.0951 32692032
0.2806 0.9189 610 1.0931 32959216
0.2184 0.9265 615 1.0937 33229304
0.2943 0.9340 620 1.0938 33500168
0.2098 0.9415 625 1.0940 33767344
0.214 0.9491 630 1.0939 34035680
0.3333 0.9566 635 1.0934 34304400
0.3684 0.9641 640 1.0933 34573040
0.204 0.9717 645 1.0951 34840664
0.2766 0.9792 650 1.0946 35106576
0.233 0.9867 655 1.0934 35378576
0.2654 0.9943 660 1.0939 35656264

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1

Base model

google/gemma-2-2b
Finetuned
(471)
this model