Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter16_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2037
  • Num Input Tokens Seen: 5033336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4057 0.0531 5 1.2789 266712
0.9946 0.1061 10 1.2203 535376
0.9751 0.1592 15 1.2176 817176
0.8049 0.2122 20 1.2373 1083600
0.7624 0.2653 25 1.2358 1352608
0.7157 0.3183 30 1.2521 1622152
0.54 0.3714 35 1.2346 1882312
0.5442 0.4244 40 1.2433 2149600
0.5808 0.4775 45 1.2429 2416240
0.4783 0.5305 50 1.2305 2682968
0.5364 0.5836 55 1.2256 2950376
0.5619 0.6366 60 1.2167 3214352
0.5027 0.6897 65 1.2278 3481120
0.4447 0.7427 70 1.2205 3747064
0.3629 0.7958 75 1.2205 4015440
0.5072 0.8488 80 1.2094 4281048
0.5246 0.9019 85 1.2102 4550336
0.5123 0.9549 90 1.2077 4814152

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter16_sftsd2

Base model

google/gemma-2-2b
Finetuned
(461)
this model