collapse_gemma-2-2b_hs2_replace_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2408
  • Num Input Tokens Seen: 4919592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.7001 0.0513 5 1.2745 250576
0.9776 0.1026 10 1.2553 507984
0.6411 0.1539 15 1.4389 759912
0.3549 0.2053 20 1.6506 1017344
0.1684 0.2566 25 1.8128 1261720
0.0927 0.3079 30 1.9916 1509936
0.088 0.3592 35 2.1525 1762648
0.0417 0.4105 40 2.2521 2020112
0.0409 0.4618 45 2.2578 2273928
0.0342 0.5131 50 2.2295 2525056
0.0366 0.5645 55 2.2589 2779656
0.0259 0.6158 60 2.2810 3029816
0.0289 0.6671 65 2.2621 3284000
0.0332 0.7184 70 2.2593 3542064
0.0288 0.7697 75 2.2449 3801936
0.0246 0.8210 80 2.2357 4058824
0.025 0.8724 85 2.2324 4315048
0.0273 0.9237 90 2.2358 4563000
0.0226 0.9750 95 2.2411 4812984

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter5_sftsd2

Base model

google/gemma-2-2b
Finetuned
(471)
this model