--- library_name: transformers license: apache-2.0 base_model: EleutherAI/pythia-70m-deduped tags: - generated_from_trainer model-index: - name: chesspythia-70m-daryo results: [] --- # chesspythia-70m-daryo This model is a fine-tuned version of [EleutherAI/pythia-70m-deduped](https://huggingface.co/EleutherAI/pythia-70m-deduped) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9213 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 64 - eval_batch_size: 64 - seed: 42 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.5097 | 0.0100 | 17 | 1.4338 | | 1.2602 | 0.0200 | 34 | 1.2784 | | 1.2292 | 0.0301 | 51 | 1.2254 | | 1.1925 | 0.0401 | 68 | 1.1904 | | 1.1533 | 0.0501 | 85 | 1.1670 | | 1.1705 | 0.0601 | 102 | 1.1515 | | 1.1103 | 0.0701 | 119 | 1.1351 | | 1.1797 | 0.0801 | 136 | 1.1217 | | 1.0768 | 0.0902 | 153 | 1.1103 | | 1.1207 | 0.1002 | 170 | 1.1013 | | 1.1476 | 0.1102 | 187 | 1.0919 | | 1.1478 | 0.1202 | 204 | 1.0913 | | 1.1316 | 0.1302 | 221 | 1.0768 | | 1.0524 | 0.1402 | 238 | 1.0750 | | 1.0392 | 0.1503 | 255 | 1.0614 | | 1.0935 | 0.1603 | 272 | 1.0641 | | 1.0097 | 0.1703 | 289 | 1.0508 | | 1.0855 | 0.1803 | 306 | 1.0528 | | 1.0863 | 0.1903 | 323 | 1.0413 | | 1.0812 | 0.2004 | 340 | 1.0400 | | 1.0675 | 0.2104 | 357 | 1.0338 | | 1.1371 | 0.2204 | 374 | 1.0348 | | 1.0607 | 0.2304 | 391 | 1.0307 | | 1.0659 | 0.2404 | 408 | 1.0268 | | 1.046 | 0.2504 | 425 | 1.0200 | | 1.0169 | 0.2605 | 442 | 1.0173 | | 1.0329 | 0.2705 | 459 | 1.0125 | | 1.0181 | 0.2805 | 476 | 1.0125 | | 1.0158 | 0.2905 | 493 | 1.0063 | | 1.0837 | 0.3005 | 510 | 1.0077 | | 1.0016 | 0.3105 | 527 | 1.0113 | | 1.0054 | 0.3206 | 544 | 1.0029 | | 1.0125 | 0.3306 | 561 | 0.9970 | | 1.027 | 0.3406 | 578 | 0.9977 | | 1.0072 | 0.3506 | 595 | 0.9903 | | 1.0993 | 0.3606 | 612 | 0.9918 | | 1.0218 | 0.3707 | 629 | 0.9872 | | 0.961 | 0.3807 | 646 | 0.9841 | | 1.0845 | 0.3907 | 663 | 0.9827 | | 1.0536 | 0.4007 | 680 | 0.9848 | | 0.9998 | 0.4107 | 697 | 0.9825 | | 1.0145 | 0.4207 | 714 | 0.9814 | | 0.9812 | 0.4308 | 731 | 0.9794 | | 0.9736 | 0.4408 | 748 | 0.9761 | | 0.9738 | 0.4508 | 765 | 0.9699 | | 1.0023 | 0.4608 | 782 | 0.9703 | | 1.0239 | 0.4708 | 799 | 0.9709 | | 0.9626 | 0.4808 | 816 | 0.9673 | | 0.9331 | 0.4909 | 833 | 0.9679 | | 0.9569 | 0.5009 | 850 | 0.9643 | | 0.9414 | 0.5109 | 867 | 0.9653 | | 0.9671 | 0.5209 | 884 | 0.9613 | | 0.9531 | 0.5309 | 901 | 0.9607 | | 0.9611 | 0.5410 | 918 | 0.9591 | | 1.0037 | 0.5510 | 935 | 0.9582 | | 1.0062 | 0.5610 | 952 | 0.9581 | | 0.9264 | 0.5710 | 969 | 0.9555 | | 0.97 | 0.5810 | 986 | 0.9546 | | 0.9121 | 0.5910 | 1003 | 0.9505 | | 0.9815 | 0.6011 | 1020 | 0.9489 | | 0.9873 | 0.6111 | 1037 | 0.9475 | | 0.9398 | 0.6211 | 1054 | 0.9467 | | 0.942 | 0.6311 | 1071 | 0.9455 | | 0.9716 | 0.6411 | 1088 | 0.9471 | | 0.9642 | 0.6511 | 1105 | 0.9436 | | 0.93 | 0.6612 | 1122 | 0.9424 | | 0.9498 | 0.6712 | 1139 | 0.9410 | | 0.9216 | 0.6812 | 1156 | 0.9420 | | 0.9522 | 0.6912 | 1173 | 0.9380 | | 0.9366 | 0.7012 | 1190 | 0.9382 | | 0.9293 | 0.7113 | 1207 | 0.9353 | | 0.9097 | 0.7213 | 1224 | 0.9356 | | 1.0044 | 0.7313 | 1241 | 0.9352 | | 0.9624 | 0.7413 | 1258 | 0.9319 | | 0.9621 | 0.7513 | 1275 | 0.9315 | | 0.9402 | 0.7613 | 1292 | 0.9314 | | 0.9148 | 0.7714 | 1309 | 0.9314 | | 0.9373 | 0.7814 | 1326 | 0.9300 | | 0.9458 | 0.7914 | 1343 | 0.9289 | | 0.917 | 0.8014 | 1360 | 0.9283 | | 0.9305 | 0.8114 | 1377 | 0.9282 | | 0.8832 | 0.8214 | 1394 | 0.9273 | | 0.908 | 0.8315 | 1411 | 0.9257 | | 0.9667 | 0.8415 | 1428 | 0.9258 | | 0.9673 | 0.8515 | 1445 | 0.9245 | | 0.9462 | 0.8615 | 1462 | 0.9246 | | 0.9475 | 0.8715 | 1479 | 0.9236 | | 0.9716 | 0.8816 | 1496 | 0.9237 | | 0.936 | 0.8916 | 1513 | 0.9231 | | 0.9497 | 0.9016 | 1530 | 0.9229 | | 0.9507 | 0.9116 | 1547 | 0.9223 | | 0.955 | 0.9216 | 1564 | 0.9221 | | 0.9212 | 0.9316 | 1581 | 0.9220 | | 0.9257 | 0.9417 | 1598 | 0.9218 | | 0.9765 | 0.9517 | 1615 | 0.9215 | | 0.9094 | 0.9617 | 1632 | 0.9214 | | 0.9401 | 0.9717 | 1649 | 0.9213 | | 0.9492 | 0.9817 | 1666 | 0.9213 | | 0.971 | 0.9918 | 1683 | 0.9213 | ### Framework versions - Transformers 4.46.2 - Pytorch 2.5.1+cu121 - Datasets 3.1.0 - Tokenizers 0.20.3