collapse_gemma-2-2b_hs2_accumulatesubsample_iter2_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1121
  • Num Input Tokens Seen: 5349352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4359 0.0531 5 1.2593 290040
1.2724 0.1062 10 1.1701 574448
1.1212 0.1594 15 1.1427 862264
1.1336 0.2125 20 1.1165 1143592
1.1278 0.2656 25 1.1143 1425768
1.0877 0.3187 30 1.1128 1712792
1.125 0.3718 35 1.1153 1997256
0.9832 0.4250 40 1.1182 2283416
1.0435 0.4781 45 1.1187 2567600
1.019 0.5312 50 1.1236 2847824
0.8503 0.5843 55 1.1216 3131552
0.8296 0.6375 60 1.1273 3417136
0.903 0.6906 65 1.1259 3704664
0.9499 0.7437 70 1.1227 3983088
0.8753 0.7968 75 1.1205 4272000
0.7801 0.8499 80 1.1197 4559656
0.7972 0.9031 85 1.1161 4845112
0.7402 0.9562 90 1.1193 5124568

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter2_sftsd2

Base model

google/gemma-2-2b
Finetuned
(484)
this model