collapse_gemma-2-2b_hs2_replace_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8917
  • Num Input Tokens Seen: 4953776

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.662 0.0527 5 1.2677 263936
1.043 0.1053 10 1.2121 527776
0.8186 0.1580 15 1.3663 793808
0.5588 0.2107 20 1.4675 1057184
0.3341 0.2633 25 1.6131 1320008
0.2249 0.3160 30 1.7661 1582280
0.1707 0.3687 35 1.8590 1848456
0.0813 0.4213 40 1.9520 2110720
0.0719 0.4740 45 1.8883 2375976
0.0652 0.5267 50 1.9238 2633904
0.0556 0.5793 55 1.9031 2897008
0.0638 0.6320 60 1.8555 3161296
0.0524 0.6847 65 1.8461 3434104
0.0338 0.7373 70 1.8539 3694144
0.0549 0.7900 75 1.8739 3946736
0.0352 0.8427 80 1.8748 4208840
0.0425 0.8953 85 1.8757 4471136
0.0349 0.9480 90 1.8530 4741888

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter3_sftsd2

Base model

google/gemma-2-2b
Finetuned
(484)
this model