collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1
This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.9280
- Num Input Tokens Seen: 13412700
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.1282 | 0 |
2.1585 | 0.0187 | 5 | 1.0516 | 253472 |
2.0366 | 0.0374 | 10 | 0.9878 | 506396 |
2.2853 | 0.0562 | 15 | 0.9800 | 760944 |
1.9353 | 0.0749 | 20 | 0.9748 | 1012816 |
1.7788 | 0.0936 | 25 | 0.9765 | 1258660 |
1.5677 | 0.1123 | 30 | 0.9865 | 1505980 |
1.6266 | 0.1310 | 35 | 0.9797 | 1748944 |
1.3893 | 0.1498 | 40 | 0.9770 | 1996076 |
1.3214 | 0.1685 | 45 | 0.9758 | 2249964 |
1.2104 | 0.1872 | 50 | 0.9732 | 2502428 |
1.1943 | 0.2059 | 55 | 0.9673 | 2758156 |
0.9618 | 0.2246 | 60 | 0.9648 | 3002952 |
0.9917 | 0.2434 | 65 | 0.9608 | 3250420 |
0.9458 | 0.2621 | 70 | 0.9592 | 3498588 |
0.8799 | 0.2808 | 75 | 0.9541 | 3753220 |
0.9288 | 0.2995 | 80 | 0.9547 | 4005744 |
0.9042 | 0.3182 | 85 | 0.9524 | 4251648 |
0.7466 | 0.3370 | 90 | 0.9507 | 4504748 |
0.802 | 0.3557 | 95 | 0.9492 | 4759604 |
0.786 | 0.3744 | 100 | 0.9468 | 5010224 |
0.8059 | 0.3931 | 105 | 0.9463 | 5261388 |
0.7014 | 0.4118 | 110 | 0.9448 | 5508984 |
0.7977 | 0.4306 | 115 | 0.9438 | 5767344 |
0.9226 | 0.4493 | 120 | 0.9425 | 6015220 |
0.9092 | 0.4680 | 125 | 0.9414 | 6270096 |
0.692 | 0.4867 | 130 | 0.9401 | 6522928 |
0.7488 | 0.5054 | 135 | 0.9394 | 6774308 |
0.6813 | 0.5242 | 140 | 0.9378 | 7026956 |
0.9565 | 0.5429 | 145 | 0.9353 | 7281764 |
0.7867 | 0.5616 | 150 | 0.9364 | 7535708 |
0.6354 | 0.5803 | 155 | 0.9373 | 7783224 |
0.8341 | 0.5990 | 160 | 0.9340 | 8026812 |
0.834 | 0.6178 | 165 | 0.9358 | 8276260 |
0.7364 | 0.6365 | 170 | 0.9338 | 8529636 |
0.7822 | 0.6552 | 175 | 0.9329 | 8787372 |
0.8144 | 0.6739 | 180 | 0.9337 | 9033612 |
0.7588 | 0.6926 | 185 | 0.9321 | 9283952 |
0.6757 | 0.7114 | 190 | 0.9320 | 9528272 |
0.5925 | 0.7301 | 195 | 0.9327 | 9775216 |
0.6711 | 0.7488 | 200 | 0.9321 | 10031428 |
0.7888 | 0.7675 | 205 | 0.9301 | 10287112 |
0.7551 | 0.7862 | 210 | 0.9322 | 10539552 |
0.7367 | 0.8050 | 215 | 0.9328 | 10786728 |
0.6682 | 0.8237 | 220 | 0.9318 | 11033040 |
0.7802 | 0.8424 | 225 | 0.9310 | 11281864 |
0.7423 | 0.8611 | 230 | 0.9317 | 11537232 |
0.8502 | 0.8798 | 235 | 0.9309 | 11791856 |
0.7691 | 0.8986 | 240 | 0.9283 | 12041012 |
0.7173 | 0.9173 | 245 | 0.9318 | 12291188 |
0.7158 | 0.9360 | 250 | 0.9296 | 12542864 |
0.7733 | 0.9547 | 255 | 0.9307 | 12794508 |
0.6864 | 0.9734 | 260 | 0.9298 | 13055348 |
0.6458 | 0.9922 | 265 | 0.9288 | 13306708 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 9
Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1
Base model
google/gemma-2-27b