RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd0

Generated from Trainer

Model card Files Files and versions Community

collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd0 / README.md

RylanSchaeffer's picture

End of training

e12cf99 verified 4 months ago

|

history blame contribute delete

4.15 kB

	---
	license: gemma
	base_model: google/gemma-2-2b
	tags:
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd0
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd0

	This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.0884
	- Num Input Tokens Seen: 10631280

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-06
	- train_batch_size: 8
	- eval_batch_size: 16
	- seed: 0
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant_with_warmup
	- lr_scheduler_warmup_ratio: 0.05
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Input Tokens Seen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-----------------:\|
	\| No log \| 0 \| 0 \| 1.3909 \| 0 \|
	\| 1.4986 \| 0.0274 \| 5 \| 1.3330 \| 291568 \|
	\| 1.3182 \| 0.0548 \| 10 \| 1.2111 \| 587448 \|
	\| 1.2698 \| 0.0822 \| 15 \| 1.1561 \| 878712 \|
	\| 1.1636 \| 0.1096 \| 20 \| 1.1285 \| 1172912 \|
	\| 1.1254 \| 0.1370 \| 25 \| 1.1113 \| 1462432 \|
	\| 1.1388 \| 0.1644 \| 30 \| 1.1125 \| 1754352 \|
	\| 1.0632 \| 0.1918 \| 35 \| 1.1148 \| 2044296 \|
	\| 1.0854 \| 0.2193 \| 40 \| 1.1123 \| 2336344 \|
	\| 1.0012 \| 0.2467 \| 45 \| 1.1118 \| 2629112 \|
	\| 0.9763 \| 0.2741 \| 50 \| 1.1233 \| 2922992 \|
	\| 0.8928 \| 0.3015 \| 55 \| 1.1148 \| 3212144 \|
	\| 0.9294 \| 0.3289 \| 60 \| 1.1208 \| 3498808 \|
	\| 0.9218 \| 0.3563 \| 65 \| 1.1160 \| 3790240 \|
	\| 0.8805 \| 0.3837 \| 70 \| 1.1220 \| 4084176 \|
	\| 0.8095 \| 0.4111 \| 75 \| 1.1249 \| 4369920 \|
	\| 0.8382 \| 0.4385 \| 80 \| 1.1195 \| 4666480 \|
	\| 0.8528 \| 0.4659 \| 85 \| 1.1163 \| 4959872 \|
	\| 0.8016 \| 0.4933 \| 90 \| 1.1147 \| 5254800 \|
	\| 0.8473 \| 0.5207 \| 95 \| 1.1142 \| 5546992 \|
	\| 0.7947 \| 0.5481 \| 100 \| 1.1122 \| 5834416 \|
	\| 0.7363 \| 0.5755 \| 105 \| 1.1072 \| 6127320 \|
	\| 0.6941 \| 0.6029 \| 110 \| 1.1062 \| 6426288 \|
	\| 0.7032 \| 0.6304 \| 115 \| 1.1080 \| 6714832 \|
	\| 0.73 \| 0.6578 \| 120 \| 1.1044 \| 7008720 \|
	\| 0.6667 \| 0.6852 \| 125 \| 1.1017 \| 7302184 \|
	\| 0.6676 \| 0.7126 \| 130 \| 1.1011 \| 7596152 \|
	\| 0.7638 \| 0.7400 \| 135 \| 1.0994 \| 7884552 \|
	\| 0.7206 \| 0.7674 \| 140 \| 1.0979 \| 8179512 \|
	\| 0.7141 \| 0.7948 \| 145 \| 1.0960 \| 8470208 \|
	\| 0.7504 \| 0.8222 \| 150 \| 1.0947 \| 8761968 \|
	\| 0.6988 \| 0.8496 \| 155 \| 1.0930 \| 9055184 \|
	\| 0.7438 \| 0.8770 \| 160 \| 1.0927 \| 9343128 \|
	\| 0.667 \| 0.9044 \| 165 \| 1.0902 \| 9637976 \|
	\| 0.7389 \| 0.9318 \| 170 \| 1.0913 \| 9930512 \|
	\| 0.7248 \| 0.9592 \| 175 \| 1.0880 \| 10226368 \|
	\| 0.7772 \| 0.9866 \| 180 \| 1.0892 \| 10513336 \|


	### Framework versions

	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1