collapse_gemma-2-2b_hs2_replace_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4776
  • Num Input Tokens Seen: 4931704

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5189 0.0513 5 1.2749 258200
0.9714 0.1026 10 1.2495 517512
0.6202 0.1539 15 1.4088 775024
0.3538 0.2053 20 1.6032 1026560
0.2158 0.2566 25 1.8219 1270944
0.1167 0.3079 30 2.0376 1527480
0.0654 0.3592 35 2.2660 1777448
0.0393 0.4105 40 2.3894 2029984
0.031 0.4618 45 2.4278 2278552
0.0292 0.5131 50 2.4650 2534640
0.0258 0.5645 55 2.4896 2783408
0.0255 0.6158 60 2.4676 3035384
0.0235 0.6671 65 2.4426 3294576
0.0249 0.7184 70 2.4442 3548680
0.0231 0.7697 75 2.4505 3807912
0.0249 0.8210 80 2.4582 4065352
0.0225 0.8724 85 2.4512 4318600
0.0216 0.9237 90 2.4613 4577512
0.021 0.9750 95 2.4749 4833752

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter6_sftsd1

Base model

google/gemma-2-2b
Finetuned
(484)
this model