collapse_gemma-2-2b_hs2_accumulatesubsample_iter15_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2148
  • Num Input Tokens Seen: 5006848

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3049 0.0537 5 1.2784 272496
1.0794 0.1073 10 1.2244 541048
0.9053 0.1610 15 1.2174 815408
0.9065 0.2146 20 1.2377 1076384
0.7415 0.2683 25 1.2640 1342656
0.5952 0.3219 30 1.2436 1605984
0.6225 0.3756 35 1.2600 1872808
0.5746 0.4292 40 1.2302 2140224
0.5049 0.4829 45 1.2372 2404344
0.615 0.5366 50 1.2218 2676784
0.4956 0.5902 55 1.2206 2946624
0.473 0.6439 60 1.2260 3223872
0.5036 0.6975 65 1.2119 3498056
0.4252 0.7512 70 1.2296 3768752
0.3339 0.8048 75 1.2122 4038184
0.5058 0.8585 80 1.2148 4304208
0.3706 0.9121 85 1.2092 4574608
0.3579 0.9658 90 1.2204 4842648

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter15_sftsd0

Base model

google/gemma-2-2b
Finetuned
(471)
this model