collapse_gemma-2-2b_hs2_accumulatesubsample_iter13_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2032
  • Num Input Tokens Seen: 5020446

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.34 0.0538 5 1.2783 273912
1.1013 0.1075 10 1.2124 550336
0.9918 0.1613 15 1.2140 821232
0.8609 0.2151 20 1.2190 1094808
0.7352 0.2688 25 1.2393 1360608
0.7336 0.3226 30 1.2311 1633144
0.6607 0.3763 35 1.2354 1902744
0.543 0.4301 40 1.2269 2170672
0.5362 0.4839 45 1.2253 2438088
0.5783 0.5376 50 1.2295 2709272
0.4413 0.5914 55 1.2153 2982760
0.5566 0.6452 60 1.2091 3250856
0.5763 0.6989 65 1.2251 3522440
0.4629 0.7527 70 1.2077 3792592
0.4905 0.8065 75 1.2210 4052656
0.4028 0.8602 80 1.2064 4317496
0.4751 0.9140 85 1.2065 4590056
0.4461 0.9677 90 1.2108 4861056

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter13_sftsd0

Base model

google/gemma-2-2b
Finetuned
(471)
this model