collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0923
Num Input Tokens Seen: 20911096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5435	0.0135	5	1.3745	285560
1.4637	0.0270	10	1.2842	569672
1.3123	0.0405	15	1.2085	847696
1.2615	0.0540	20	1.1682	1136376
1.0816	0.0675	25	1.1606	1424000
0.965	0.0810	30	1.1604	1707696
0.8694	0.0945	35	1.1667	1989768
0.7971	0.1080	40	1.1810	2275264
0.76	0.1215	45	1.1947	2558504
0.6078	0.1350	50	1.1804	2840936
0.6925	0.1484	55	1.1709	3123416
0.542	0.1619	60	1.1698	3400336
0.5919	0.1754	65	1.1590	3677280
0.5911	0.1889	70	1.1663	3960968
0.5761	0.2024	75	1.1571	4236960
0.5491	0.2159	80	1.1588	4521336
0.4891	0.2294	85	1.1530	4802232
0.4634	0.2429	90	1.1474	5083368
0.4253	0.2564	95	1.1480	5368512
0.5415	0.2699	100	1.1389	5652976
0.4538	0.2834	105	1.1422	5935704
0.4739	0.2969	110	1.1375	6220840
0.5449	0.3104	115	1.1372	6501656
0.5307	0.3239	120	1.1331	6790056
0.4381	0.3374	125	1.1316	7075592
0.5068	0.3509	130	1.1243	7356824
0.373	0.3644	135	1.1298	7641384
0.4322	0.3779	140	1.1246	7923528
0.3658	0.3914	145	1.1268	8200376
0.4601	0.4049	150	1.1220	8486080
0.415	0.4184	155	1.1249	8769112
0.4452	0.4318	160	1.1194	9051632
0.5344	0.4453	165	1.1201	9330416
0.2906	0.4588	170	1.1192	9612936
0.4358	0.4723	175	1.1149	9893880
0.354	0.4858	180	1.1164	10178232
0.3467	0.4993	185	1.1129	10465696
0.4397	0.5128	190	1.1143	10744624
0.4027	0.5263	195	1.1127	11024912
0.5438	0.5398	200	1.1101	11311552
0.3847	0.5533	205	1.1106	11595104
0.4611	0.5668	210	1.1080	11877432
0.5404	0.5803	215	1.1099	12161768
0.4367	0.5938	220	1.1110	12444336
0.3969	0.6073	225	1.1060	12723640
0.4421	0.6208	230	1.1064	13012280
0.3727	0.6343	235	1.1065	13299312
0.3602	0.6478	240	1.1060	13583528
0.4531	0.6613	245	1.1068	13867168
0.399	0.6748	250	1.1033	14146944
0.4072	0.6883	255	1.1027	14427864
0.4039	0.7018	260	1.1032	14717592
0.5127	0.7152	265	1.1015	14999968
0.2753	0.7287	270	1.1017	15281672
0.4518	0.7422	275	1.1021	15556800
0.5064	0.7557	280	1.1010	15835432
0.3544	0.7692	285	1.1000	16114160
0.3527	0.7827	290	1.0987	16394640
0.3349	0.7962	295	1.0996	16673872
0.3976	0.8097	300	1.0978	16956880
0.4281	0.8232	305	1.0964	17241504
0.3262	0.8367	310	1.0974	17525672
0.3472	0.8502	315	1.0957	17810632
0.3196	0.8637	320	1.0963	18094576
0.4214	0.8772	325	1.0945	18375800
0.3303	0.8907	330	1.0943	18657920
0.4292	0.9042	335	1.0994	18933720
0.2995	0.9177	340	1.0943	19215688
0.3703	0.9312	345	1.0937	19498024
0.3855	0.9447	350	1.0940	19774016
0.4176	0.9582	355	1.0926	20060200
0.3698	0.9717	360	1.0914	20343704
0.3759	0.9852	365	1.0908	20627104
0.3329	0.9987	370	1.0923	20911096

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd0

Evaluation results