chesspythia-70m-daryo

This model is a fine-tuned version of EleutherAI/pythia-70m-deduped on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9213

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
1.5097	0.0100	17	1.4338
1.2602	0.0200	34	1.2784
1.2292	0.0301	51	1.2254
1.1925	0.0401	68	1.1904
1.1533	0.0501	85	1.1670
1.1705	0.0601	102	1.1515
1.1103	0.0701	119	1.1351
1.1797	0.0801	136	1.1217
1.0768	0.0902	153	1.1103
1.1207	0.1002	170	1.1013
1.1476	0.1102	187	1.0919
1.1478	0.1202	204	1.0913
1.1316	0.1302	221	1.0768
1.0524	0.1402	238	1.0750
1.0392	0.1503	255	1.0614
1.0935	0.1603	272	1.0641
1.0097	0.1703	289	1.0508
1.0855	0.1803	306	1.0528
1.0863	0.1903	323	1.0413
1.0812	0.2004	340	1.0400
1.0675	0.2104	357	1.0338
1.1371	0.2204	374	1.0348
1.0607	0.2304	391	1.0307
1.0659	0.2404	408	1.0268
1.046	0.2504	425	1.0200
1.0169	0.2605	442	1.0173
1.0329	0.2705	459	1.0125
1.0181	0.2805	476	1.0125
1.0158	0.2905	493	1.0063
1.0837	0.3005	510	1.0077
1.0016	0.3105	527	1.0113
1.0054	0.3206	544	1.0029
1.0125	0.3306	561	0.9970
1.027	0.3406	578	0.9977
1.0072	0.3506	595	0.9903
1.0993	0.3606	612	0.9918
1.0218	0.3707	629	0.9872
0.961	0.3807	646	0.9841
1.0845	0.3907	663	0.9827
1.0536	0.4007	680	0.9848
0.9998	0.4107	697	0.9825
1.0145	0.4207	714	0.9814
0.9812	0.4308	731	0.9794
0.9736	0.4408	748	0.9761
0.9738	0.4508	765	0.9699
1.0023	0.4608	782	0.9703
1.0239	0.4708	799	0.9709
0.9626	0.4808	816	0.9673
0.9331	0.4909	833	0.9679
0.9569	0.5009	850	0.9643
0.9414	0.5109	867	0.9653
0.9671	0.5209	884	0.9613
0.9531	0.5309	901	0.9607
0.9611	0.5410	918	0.9591
1.0037	0.5510	935	0.9582
1.0062	0.5610	952	0.9581
0.9264	0.5710	969	0.9555
0.97	0.5810	986	0.9546
0.9121	0.5910	1003	0.9505
0.9815	0.6011	1020	0.9489
0.9873	0.6111	1037	0.9475
0.9398	0.6211	1054	0.9467
0.942	0.6311	1071	0.9455
0.9716	0.6411	1088	0.9471
0.9642	0.6511	1105	0.9436
0.93	0.6612	1122	0.9424
0.9498	0.6712	1139	0.9410
0.9216	0.6812	1156	0.9420
0.9522	0.6912	1173	0.9380
0.9366	0.7012	1190	0.9382
0.9293	0.7113	1207	0.9353
0.9097	0.7213	1224	0.9356
1.0044	0.7313	1241	0.9352
0.9624	0.7413	1258	0.9319
0.9621	0.7513	1275	0.9315
0.9402	0.7613	1292	0.9314
0.9148	0.7714	1309	0.9314
0.9373	0.7814	1326	0.9300
0.9458	0.7914	1343	0.9289
0.917	0.8014	1360	0.9283
0.9305	0.8114	1377	0.9282
0.8832	0.8214	1394	0.9273
0.908	0.8315	1411	0.9257
0.9667	0.8415	1428	0.9258
0.9673	0.8515	1445	0.9245
0.9462	0.8615	1462	0.9246
0.9475	0.8715	1479	0.9236
0.9716	0.8816	1496	0.9237
0.936	0.8916	1513	0.9231
0.9497	0.9016	1530	0.9229
0.9507	0.9116	1547	0.9223
0.955	0.9216	1564	0.9221
0.9212	0.9316	1581	0.9220
0.9257	0.9417	1598	0.9218
0.9765	0.9517	1615	0.9215
0.9094	0.9617	1632	0.9214
0.9401	0.9717	1649	0.9213
0.9492	0.9817	1666	0.9213
0.971	0.9918	1683	0.9213

Framework versions

Transformers 4.46.2
Pytorch 2.5.1+cu121
Datasets 3.1.0
Tokenizers 0.20.3

darwinfegarido
/

chesspythia-70m-daryo

chesspythia-70m-daryo

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for darwinfegarido/chesspythia-70m-daryo

Evaluation results