Edit model card

collapse_gemma-2-9b_hs2_accumulate_iter1_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9317
  • Num Input Tokens Seen: 5253020

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.2335 0
1.1387 0.0511 5 1.0668 272284
1.0001 0.1021 10 0.9837 541544
0.973 0.1532 15 0.9711 815788
0.9408 0.2043 20 0.9643 1087364
0.9645 0.2553 25 0.9589 1353816
0.9649 0.3064 30 0.9552 1626852
0.9603 0.3575 35 0.9514 1893740
0.9398 0.4086 40 0.9476 2168380
0.9474 0.4596 45 0.9456 2438380
0.9416 0.5107 50 0.9432 2708844
0.9821 0.5618 55 0.9417 2978352
0.9701 0.6128 60 0.9409 3244148
0.9409 0.6639 65 0.9382 3518100
1.0424 0.7150 70 0.9367 3784544
0.8923 0.7660 75 0.9348 4055360
0.9918 0.8171 80 0.9339 4326216
0.9415 0.8682 85 0.9338 4595508
0.8826 0.9192 90 0.9325 4869076
0.9146 0.9703 95 0.9320 5142272

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter1_sftsd0

Base model

google/gemma-2-9b
Finetuned
(81)
this model