collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0934
Num Input Tokens Seen: 35818760

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.6217	0.0075	5	1.3867	271000
1.4655	0.0151	10	1.3414	546632
1.4425	0.0226	15	1.2752	819672
1.3352	0.0301	20	1.2179	1087416
1.1854	0.0377	25	1.1801	1356912
1.0295	0.0452	30	1.1849	1633120
0.9569	0.0527	35	1.1962	1902312
0.7022	0.0603	40	1.2303	2168672
0.7055	0.0678	45	1.2339	2435544
0.6248	0.0753	50	1.2358	2703648
0.5441	0.0829	55	1.2145	2967560
0.5434	0.0904	60	1.2004	3236944
0.4472	0.0979	65	1.1988	3506976
0.4555	0.1055	70	1.1838	3785080
0.4008	0.1130	75	1.1891	4055320
0.3689	0.1205	80	1.1814	4326912
0.3985	0.1280	85	1.1675	4595872
0.2766	0.1356	90	1.1743	4861152
0.3589	0.1431	95	1.1632	5135264
0.4281	0.1506	100	1.1654	5413792
0.2638	0.1582	105	1.1621	5686704
0.3134	0.1657	110	1.1585	5956968
0.4167	0.1732	115	1.1541	6224872
0.2923	0.1808	120	1.1566	6493312
0.4076	0.1883	125	1.1523	6775120
0.3545	0.1958	130	1.1504	7043896
0.2846	0.2034	135	1.1519	7311696
0.3653	0.2109	140	1.1472	7578920
0.3325	0.2184	145	1.1503	7845576
0.3284	0.2260	150	1.1466	8115408
0.2892	0.2335	155	1.1414	8385200
0.2424	0.2410	160	1.1451	8657328
0.2332	0.2486	165	1.1433	8935176
0.1998	0.2561	170	1.1409	9211448
0.304	0.2636	175	1.1400	9482072
0.3124	0.2712	180	1.1379	9753520
0.3096	0.2787	185	1.1429	10020056
0.3539	0.2862	190	1.1358	10292264
0.308	0.2938	195	1.1379	10554488
0.2535	0.3013	200	1.1357	10822488
0.3166	0.3088	205	1.1328	11097256
0.2653	0.3164	210	1.1327	11376640
0.2697	0.3239	215	1.1351	11643032
0.2742	0.3314	220	1.1293	11919368
0.3344	0.3390	225	1.1314	12187896
0.1981	0.3465	230	1.1284	12461560
0.2823	0.3540	235	1.1275	12733568
0.3029	0.3615	240	1.1289	12999600
0.3232	0.3691	245	1.1257	13267680
0.2336	0.3766	250	1.1287	13533656
0.2642	0.3841	255	1.1263	13808592
0.3177	0.3917	260	1.1228	14075880
0.284	0.3992	265	1.1247	14343328
0.3039	0.4067	270	1.1206	14612480
0.2793	0.4143	275	1.1206	14882944
0.3073	0.4218	280	1.1250	15154088
0.3092	0.4293	285	1.1196	15420928
0.2349	0.4369	290	1.1192	15691528
0.1937	0.4444	295	1.1194	15966376
0.3677	0.4519	300	1.1175	16235816
0.1964	0.4595	305	1.1174	16503712
0.3342	0.4670	310	1.1173	16780344
0.2434	0.4745	315	1.1193	17047624
0.3076	0.4821	320	1.1144	17315800
0.2931	0.4896	325	1.1149	17589048
0.2965	0.4971	330	1.1140	17850624
0.3294	0.5047	335	1.1122	18123168
0.3072	0.5122	340	1.1134	18404496
0.1833	0.5197	345	1.1117	18672712
0.2871	0.5273	350	1.1118	18942920
0.2124	0.5348	355	1.1119	19214880
0.3152	0.5423	360	1.1098	19486872
0.2688	0.5499	365	1.1115	19750920
0.2113	0.5574	370	1.1113	20021312
0.2936	0.5649	375	1.1104	20291192
0.1659	0.5725	380	1.1079	20554376
0.2615	0.5800	385	1.1091	20820304
0.1893	0.5875	390	1.1092	21088216
0.2997	0.5950	395	1.1076	21356104
0.2985	0.6026	400	1.1055	21624024
0.2521	0.6101	405	1.1069	21901144
0.2243	0.6176	410	1.1078	22177408
0.2994	0.6252	415	1.1041	22446056
0.1927	0.6327	420	1.1061	22712816
0.204	0.6402	425	1.1064	22989840
0.2584	0.6478	430	1.1028	23260064
0.2422	0.6553	435	1.1029	23530560
0.2784	0.6628	440	1.1048	23803448
0.2613	0.6704	445	1.1038	24068080
0.227	0.6779	450	1.1019	24333176
0.2461	0.6854	455	1.1031	24603392
0.1918	0.6930	460	1.1035	24876384
0.2125	0.7005	465	1.1012	25140928
0.2905	0.7080	470	1.1015	25405968
0.1957	0.7156	475	1.1019	25677032
0.1903	0.7231	480	1.1001	25949848
0.2938	0.7306	485	1.1011	26219712
0.2621	0.7382	490	1.1027	26491816
0.2448	0.7457	495	1.1013	26760152
0.2177	0.7532	500	1.1003	27026592
0.3036	0.7608	505	1.1006	27298440
0.2885	0.7683	510	1.0999	27571464
0.3118	0.7758	515	1.0983	27843400
0.2362	0.7834	520	1.0990	28113024
0.2036	0.7909	525	1.0983	28381952
0.3301	0.7984	530	1.0979	28654648
0.3089	0.8060	535	1.0977	28927576
0.2125	0.8135	540	1.0983	29196512
0.1817	0.8210	545	1.0985	29471184
0.3252	0.8285	550	1.0975	29742216
0.2176	0.8361	555	1.0970	30010528
0.2441	0.8436	560	1.0972	30278888
0.2678	0.8511	565	1.0980	30549480
0.2069	0.8587	570	1.0959	30816968
0.2432	0.8662	575	1.0961	31089360
0.1981	0.8737	580	1.0974	31354488
0.2415	0.8813	585	1.0952	31624248
0.2379	0.8888	590	1.0944	31891576
0.2349	0.8963	595	1.0963	32153000
0.1643	0.9039	600	1.0952	32419552
0.2094	0.9114	605	1.0951	32692032
0.2806	0.9189	610	1.0931	32959216
0.2184	0.9265	615	1.0937	33229304
0.2943	0.9340	620	1.0938	33500168
0.2098	0.9415	625	1.0940	33767344
0.214	0.9491	630	1.0939	34035680
0.3333	0.9566	635	1.0934	34304400
0.3684	0.9641	640	1.0933	34573040
0.204	0.9717	645	1.0951	34840664
0.2766	0.9792	650	1.0946	35106576
0.233	0.9867	655	1.0934	35378576
0.2654	0.9943	660	1.0939	35656264

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1

collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd1

Evaluation results