collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1431
  • Num Input Tokens Seen: 5251440

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3102 0.0539 5 1.2636 283856
1.3691 0.1077 10 1.1765 571632
1.1242 0.1616 15 1.1491 857944
1.1425 0.2155 20 1.1270 1146080
1.1527 0.2694 25 1.1235 1423176
1.0294 0.3232 30 1.1268 1709384
0.9761 0.3771 35 1.1413 1997472
1.0079 0.4310 40 1.1340 2289288
0.9212 0.4848 45 1.1454 2577432
0.871 0.5387 50 1.1548 2863824
0.8043 0.5926 55 1.1584 3143184
0.7448 0.6465 60 1.1527 3429216
0.8393 0.7003 65 1.1466 3713984
0.8134 0.7542 70 1.1457 4005488
0.7978 0.8081 75 1.1524 4284408
0.7489 0.8620 80 1.1426 4569048
0.6384 0.9158 85 1.1419 4853184
0.6986 0.9697 90 1.1415 5141336

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter3_sftsd0

Base model

google/gemma-2-2b
Finetuned
(484)
this model