metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0
    results: []

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1021
Num Input Tokens Seen: 21968712

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.6085	0.0130	5	1.3800	289184
1.4378	0.0260	10	1.2933	571680
1.3575	0.0390	15	1.2182	858680
1.3348	0.0520	20	1.1684	1145936
1.1904	0.0650	25	1.1500	1437472
1.2228	0.0779	30	1.1339	1724288
1.0694	0.0909	35	1.1383	2009272
0.9697	0.1039	40	1.1630	2289000
0.9051	0.1169	45	1.1742	2569208
0.8855	0.1299	50	1.1729	2856576
0.8853	0.1429	55	1.1758	3146856
0.8296	0.1559	60	1.1816	3431392
0.7121	0.1689	65	1.1736	3726000
0.7528	0.1819	70	1.1792	4010080
0.5996	0.1949	75	1.1802	4295264
0.6437	0.2079	80	1.1785	4576256
0.6683	0.2209	85	1.1733	4869384
0.5115	0.2338	90	1.1750	5151776
0.545	0.2468	95	1.1701	5443960
0.5348	0.2598	100	1.1673	5728368
0.5687	0.2728	105	1.1641	6017560
0.4856	0.2858	110	1.1663	6300000
0.4691	0.2988	115	1.1630	6586672
0.4454	0.3118	120	1.1585	6869504
0.5734	0.3248	125	1.1606	7159680
0.4317	0.3378	130	1.1529	7437936
0.4603	0.3508	135	1.1541	7727120
0.5264	0.3638	140	1.1542	8013352
0.5051	0.3767	145	1.1493	8302848
0.397	0.3897	150	1.1528	8588472
0.4173	0.4027	155	1.1463	8876960
0.3443	0.4157	160	1.1474	9156600
0.4343	0.4287	165	1.1455	9440520
0.4683	0.4417	170	1.1431	9726600
0.4732	0.4547	175	1.1408	10009248
0.4876	0.4677	180	1.1414	10297320
0.4574	0.4807	185	1.1369	10582704
0.4038	0.4937	190	1.1354	10870648
0.4239	0.5067	195	1.1355	11148576
0.5262	0.5196	200	1.1291	11436464
0.4788	0.5326	205	1.1322	11721416
0.3975	0.5456	210	1.1276	12012696
0.3807	0.5586	215	1.1310	12299376
0.4784	0.5716	220	1.1232	12594368
0.4	0.5846	225	1.1272	12880616
0.4511	0.5976	230	1.1229	13164112
0.4119	0.6106	235	1.1234	13446016
0.3515	0.6236	240	1.1224	13729688
0.3695	0.6366	245	1.1201	14015064
0.387	0.6496	250	1.1190	14303192
0.4503	0.6626	255	1.1167	14587200
0.3205	0.6755	260	1.1184	14875032
0.3369	0.6885	265	1.1154	15159592
0.46	0.7015	270	1.1173	15443480
0.4148	0.7145	275	1.1121	15737624
0.4251	0.7275	280	1.1141	16021928
0.3786	0.7405	285	1.1126	16306944
0.3593	0.7535	290	1.1114	16592904
0.4698	0.7665	295	1.1114	16875744
0.3327	0.7795	300	1.1098	17163408
0.3521	0.7925	305	1.1125	17451024
0.3682	0.8055	310	1.1076	17741680
0.3266	0.8184	315	1.1098	18022800
0.3986	0.8314	320	1.1078	18298600
0.3869	0.8444	325	1.1078	18585288
0.3904	0.8574	330	1.1072	18870912
0.361	0.8704	335	1.1070	19165960
0.4643	0.8834	340	1.1047	19458704
0.4603	0.8964	345	1.1048	19741152
0.4815	0.9094	350	1.1053	20029752
0.3097	0.9224	355	1.1050	20317240
0.3686	0.9354	360	1.1033	20601320
0.485	0.9484	365	1.1042	20895904
0.3946	0.9614	370	1.1014	21179672
0.4621	0.9743	375	1.1032	21460376
0.4748	0.9873	380	1.1025	21737656

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1