augmented_step_val_25_gemma-2-2b_hs2_iter1_sftsd1

This model is a fine-tuned version of jkazdan/step_val_25_gemma-2-2b_hs2_iter1_sftsd2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4633
  • Num Input Tokens Seen: 8037456

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.0950 0
1.3679 0.0346 5 1.0949 278656
1.2514 0.0692 10 1.1089 558136
1.245 0.1038 15 1.1349 828560
1.0777 0.1383 20 1.1824 1108192
0.9197 0.1729 25 1.2347 1388824
0.7782 0.2075 30 1.3223 1665952
0.6517 0.2421 35 1.3822 1950056
0.538 0.2767 40 1.4250 2228112
0.484 0.3113 45 1.5033 2510296
0.4439 0.3459 50 1.4842 2795248
0.3461 0.3805 55 1.4639 3079696
0.233 0.4150 60 1.5173 3363824
0.285 0.4496 65 1.4692 3641232
0.3629 0.4842 70 1.4453 3912784
0.299 0.5188 75 1.4535 4191336
0.1821 0.5534 80 1.4589 4471320
0.1861 0.5880 85 1.4087 4751488
0.1433 0.6226 90 1.4598 5027576
0.168 0.6572 95 1.4293 5303728
0.1826 0.6917 100 1.4074 5582944
0.1109 0.7263 105 1.4264 5859696
0.1645 0.7609 110 1.4006 6130248
0.1012 0.7955 115 1.3990 6409520
0.0968 0.8301 120 1.4104 6691040
0.1176 0.8647 125 1.4425 6969344
0.1783 0.8993 130 1.4141 7248104
0.099 0.9339 135 1.4359 7526360
0.1133 0.9684 140 1.4366 7808664

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/augmented_step_val_25_gemma-2-2b_hs2_iter1_sftsd1

Base model

google/gemma-2-2b
Finetuned
(3)
this model