collapse_gemma-2-2b_hs2_accumulatesubsample_iter12_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1920
  • Num Input Tokens Seen: 5011248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3649 0.0529 5 1.2745 267096
1.1224 0.1058 10 1.2058 530288
0.9974 0.1587 15 1.2049 800248
0.8189 0.2116 20 1.2372 1058320
0.7833 0.2646 25 1.2189 1325704
0.6665 0.3175 30 1.2693 1584760
0.5681 0.3704 35 1.2443 1856304
0.5335 0.4233 40 1.2355 2125480
0.5541 0.4762 45 1.2238 2393968
0.4262 0.5291 50 1.2276 2656976
0.4628 0.5820 55 1.2021 2920640
0.3494 0.6349 60 1.2094 3190360
0.4511 0.6878 65 1.1954 3457336
0.3678 0.7407 70 1.1997 3727624
0.4241 0.7937 75 1.1929 3995904
0.3534 0.8466 80 1.1951 4259976
0.3476 0.8995 85 1.1903 4524480
0.4014 0.9524 90 1.1970 4798896

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter12_sftsd1

Base model

google/gemma-2-2b
Finetuned
(484)
this model