augmented_step_val_25_gemma-2-2b_hs2_iter1_sftsd0

This model is a fine-tuned version of jkazdan/step_val_25_gemma-2-2b_hs2_iter1_sftsd2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5241
  • Num Input Tokens Seen: 7902160

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.0950 0
1.4641 0.0363 5 1.0952 288056
1.2843 0.0726 10 1.1081 573696
1.179 0.1089 15 1.1363 864312
1.0141 0.1452 20 1.1791 1155592
0.9315 0.1815 25 1.2351 1442896
0.825 0.2178 30 1.3062 1738192
0.6513 0.2541 35 1.3937 2026640
0.5567 0.2904 40 1.4694 2311728
0.5304 0.3267 45 1.4723 2603472
0.372 0.3630 50 1.4773 2895216
0.3612 0.3993 55 1.4670 3177072
0.3167 0.4356 60 1.4953 3464608
0.2068 0.4719 65 1.5190 3749472
0.1664 0.5082 70 1.4786 4033064
0.2256 0.5445 75 1.4518 4326968
0.1704 0.5808 80 1.4577 4611416
0.1391 0.6171 85 1.5038 4903168
0.2488 0.6534 90 1.4373 5191528
0.1726 0.6897 95 1.5123 5474696
0.1696 0.7260 100 1.4582 5757304
0.1919 0.7623 105 1.4735 6047208
0.1987 0.7985 110 1.4654 6343824
0.256 0.8348 115 1.4215 6627376
0.0984 0.8711 120 1.5130 6915440
0.108 0.9074 125 1.4880 7206272
0.1414 0.9437 130 1.4197 7504304
0.1076 0.9800 135 1.5077 7784504

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/augmented_step_val_25_gemma-2-2b_hs2_iter1_sftsd0

Base model

google/gemma-2-2b
Finetuned
(3)
this model