collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1452
  • Num Input Tokens Seen: 5338776

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5228 0.0530 5 1.2631 285904
1.2844 0.1060 10 1.1761 573320
1.269 0.1589 15 1.1464 856296
1.1088 0.2119 20 1.1267 1134800
1.0695 0.2649 25 1.1290 1413200
1.027 0.3179 30 1.1306 1697288
0.9688 0.3709 35 1.1340 1980216
0.9701 0.4238 40 1.1427 2266568
0.949 0.4768 45 1.1409 2548552
0.9408 0.5298 50 1.1578 2839880
0.9139 0.5828 55 1.1506 3115520
0.8606 0.6358 60 1.1560 3398440
0.8238 0.6887 65 1.1561 3687696
0.8161 0.7417 70 1.1506 3977240
0.7423 0.7947 75 1.1503 4256976
0.7188 0.8477 80 1.1514 4544776
0.6642 0.9007 85 1.1464 4827760
0.6403 0.9536 90 1.1524 5108184

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter3_sftsd2

Base model

google/gemma-2-2b
Finetuned
(471)
this model