collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0880
  • Num Input Tokens Seen: 10886024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4818 0.0264 5 1.3328 284168
1.4014 0.0528 10 1.2146 570464
1.247 0.0792 15 1.1552 859552
1.2344 0.1056 20 1.1316 1139712
1.0727 0.1321 25 1.1148 1425952
1.0489 0.1585 30 1.1144 1712584
1.0564 0.1849 35 1.1157 1999000
1.0475 0.2113 40 1.1221 2278656
1.0397 0.2377 45 1.1144 2567096
0.9626 0.2641 50 1.1186 2858408
0.9346 0.2905 55 1.1198 3145312
0.9472 0.3169 60 1.1231 3435992
0.9308 0.3433 65 1.1217 3729256
0.7938 0.3698 70 1.1223 4015952
0.8555 0.3962 75 1.1211 4305600
0.8708 0.4226 80 1.1195 4599712
0.8453 0.4490 85 1.1167 4888360
0.7371 0.4754 90 1.1169 5180504
0.8233 0.5018 95 1.1128 5473352
0.8823 0.5282 100 1.1131 5765104
0.623 0.5546 105 1.1111 6052128
0.7361 0.5810 110 1.1069 6343856
0.8444 0.6075 115 1.1103 6631416
0.7777 0.6339 120 1.1068 6921552
0.6832 0.6603 125 1.1054 7209048
0.8106 0.6867 130 1.1039 7489664
0.6772 0.7131 135 1.1007 7782048
0.7388 0.7395 140 1.0992 8068440
0.8197 0.7659 145 1.0968 8360312
0.6981 0.7923 150 1.0959 8648720
0.6736 0.8188 155 1.0956 8940416
0.7139 0.8452 160 1.0935 9223368
0.8445 0.8716 165 1.0927 9508432
0.6475 0.8980 170 1.0919 9797464
0.7119 0.9244 175 1.0904 10086248
0.8095 0.9508 180 1.0897 10378552
0.6255 0.9772 185 1.0894 10659304

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1

Base model

google/gemma-2-2b
Finetuned
(484)
this model