collapse_gemma-2-2b_hs2_replace_iter8_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4272
  • Num Input Tokens Seen: 4951896

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5175 0.0513 5 1.2789 257144
0.9829 0.1026 10 1.3042 514112
0.6046 0.1539 15 1.5441 764984
0.2914 0.2053 20 1.7425 1029856
0.1385 0.2566 25 1.9543 1280440
0.0926 0.3079 30 2.1538 1540792
0.0413 0.3592 35 2.3295 1791360
0.0362 0.4105 40 2.3097 2050952
0.0301 0.4618 45 2.3998 2309592
0.0253 0.5131 50 2.3280 2558648
0.0348 0.5645 55 2.3369 2824704
0.0323 0.6158 60 2.3580 3080328
0.0331 0.6671 65 2.3628 3333360
0.0257 0.7184 70 2.3566 3579552
0.0228 0.7697 75 2.3590 3836760
0.021 0.8210 80 2.3724 4089776
0.0232 0.8724 85 2.3932 4342392
0.0223 0.9237 90 2.4080 4600240
0.0243 0.9750 95 2.4271 4849520

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter8_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model