collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1371
  • Num Input Tokens Seen: 5264318

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4103 0.0538 5 1.2645 285920
1.3042 0.1075 10 1.1766 573248
1.2148 0.1613 15 1.1511 853008
1.1212 0.2151 20 1.1320 1136200
1.0406 0.2688 25 1.1354 1412848
1.1139 0.3226 30 1.1353 1691488
0.93 0.3763 35 1.1526 1969840
0.9491 0.4301 40 1.1464 2249384
0.8255 0.4839 45 1.1531 2529768
0.8226 0.5376 50 1.1457 2813872
0.8505 0.5914 55 1.1510 3105400
0.7052 0.6452 60 1.1498 3391880
0.7749 0.6989 65 1.1413 3678440
0.6941 0.7527 70 1.1457 3959096
0.6859 0.8065 75 1.1410 4248384
0.5947 0.8602 80 1.1435 4534176
0.6197 0.9140 85 1.1393 4811344
0.5752 0.9677 90 1.1362 5094360

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter3_sftsd1

Base model

google/gemma-2-2b
Finetuned
(471)
this model