collapse_gemma-2-2b_hs2_accumulatesubsample_iter10_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2073
  • Num Input Tokens Seen: 5032712

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3566 0.0539 5 1.2744 272448
1.1352 0.1079 10 1.2127 547040
0.9502 0.1618 15 1.1996 825664
0.8885 0.2158 20 1.2152 1093864
0.8554 0.2697 25 1.2327 1366616
0.8248 0.3237 30 1.2216 1642200
0.7493 0.3776 35 1.2269 1920624
0.6711 0.4316 40 1.2200 2193904
0.7145 0.4855 45 1.2118 2471312
0.5736 0.5394 50 1.2113 2743896
0.6077 0.5934 55 1.2109 3020256
0.5245 0.6473 60 1.2123 3293520
0.566 0.7013 65 1.2143 3567816
0.5426 0.7552 70 1.1968 3834000
0.5058 0.8092 75 1.2092 4106144
0.4798 0.8631 80 1.1969 4379288
0.4227 0.9171 85 1.2013 4654784
0.494 0.9710 90 1.2026 4929264

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter10_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model