collapse_gemma-2-2b_hs2_accumulatesubsample_iter11_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2001
  • Num Input Tokens Seen: 5006560

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3681 0.0533 5 1.2755 261848
1.118 0.1065 10 1.2036 531488
1.0328 0.1598 15 1.1938 805024
0.9333 0.2130 20 1.2164 1075696
0.8818 0.2663 25 1.2197 1343520
0.8052 0.3196 30 1.2471 1613144
0.6982 0.3728 35 1.2279 1880416
0.6596 0.4261 40 1.2155 2153872
0.6369 0.4794 45 1.2117 2422304
0.4371 0.5326 50 1.2172 2694960
0.43 0.5859 55 1.2114 2956912
0.5446 0.6391 60 1.2153 3230368
0.5092 0.6924 65 1.2052 3504768
0.4249 0.7457 70 1.2069 3771264
0.4894 0.7989 75 1.2033 4044656
0.6736 0.8522 80 1.2046 4314136
0.5114 0.9055 85 1.1947 4586208
0.3609 0.9587 90 1.2142 4850112

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter11_sftsd2

Base model

google/gemma-2-2b
Finetuned
(484)
this model