collapse_gemma-2-2b_hs2_accumulatesubsample_iter13_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1937
  • Num Input Tokens Seen: 4967072

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4147 0.0528 5 1.2762 263776
1.1751 0.1056 10 1.2089 527904
0.9507 0.1584 15 1.2023 795176
0.8056 0.2112 20 1.2344 1051032
0.6575 0.2640 25 1.2598 1316456
0.6252 0.3168 30 1.2757 1583656
0.5477 0.3696 35 1.2561 1847776
0.5462 0.4224 40 1.2272 2113544
0.5597 0.4752 45 1.2306 2386008
0.4005 0.5281 50 1.2235 2650504
0.5095 0.5809 55 1.2107 2915648
0.3978 0.6337 60 1.2088 3173912
0.3427 0.6865 65 1.2017 3439032
0.3256 0.7393 70 1.2081 3699752
0.3051 0.7921 75 1.1954 3970736
0.4045 0.8449 80 1.1972 4229528
0.4072 0.8977 85 1.1940 4490136
0.307 0.9505 90 1.1987 4751368

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter13_sftsd1

Base model

google/gemma-2-2b
Finetuned
(484)
this model