collapse_gemma-2-2b_hs2_accumulatesubsample_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1502
  • Num Input Tokens Seen: 5139552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4366 0.0537 5 1.2652 272808
1.205 0.1075 10 1.1792 543728
1.0601 0.1612 15 1.1613 817168
0.9601 0.2149 20 1.1532 1092816
0.9378 0.2686 25 1.1523 1371272
0.9852 0.3224 30 1.1648 1652968
0.9609 0.3761 35 1.1735 1931576
0.8948 0.4298 40 1.1661 2213968
0.8069 0.4835 45 1.1685 2496776
0.6446 0.5373 50 1.1695 2771880
0.7284 0.5910 55 1.1612 3049008
0.6245 0.6447 60 1.1637 3321840
0.5641 0.6985 65 1.1559 3594864
0.5613 0.7522 70 1.1590 3871512
0.6246 0.8059 75 1.1572 4140888
0.6635 0.8596 80 1.1523 4417664
0.626 0.9134 85 1.1528 4694904
0.579 0.9671 90 1.1477 4973416

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter4_sftsd1

Base model

google/gemma-2-2b
Finetuned
(484)
this model