sft-sum-chosen-10lp-shuff-full-tiny
This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T on the martimfasantos/openai-summarize-tldr dataset. It achieves the following results on the evaluation set:
- Loss: 1.9409
- Nll Loss: 1.9409
- Logps/best: -72.8478
- Rewards/chosen: 2.0114
- Rewards/rejected: -0.4229
- Rewards/accuracies: 0.9998
- Rewards/margins: 2.4343
- Logps/rejected: -11.6536
- Logps/chosen: -72.8478
- Logits/rejected: -2.6479
- Logits/chosen: -2.9522
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Nll Loss | Logps/best | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2.3573 | 0.0137 | 100 | 2.3703 | 2.3703 | -88.8140 | 0.4147 | 0.0412 | 1.0 | 0.3735 | -7.0125 | -88.8140 | -2.6551 | -2.9658 |
2.1904 | 0.0274 | 200 | 2.1322 | 2.1322 | -79.9647 | 1.2997 | 0.0373 | 1.0 | 1.2624 | -7.0516 | -79.9647 | -2.6656 | -2.9758 |
1.9956 | 0.0411 | 300 | 2.0629 | 2.0629 | -77.3844 | 1.5577 | -0.1097 | 0.9995 | 1.6674 | -8.5217 | -77.3844 | -2.6813 | -2.9915 |
2.0379 | 0.0548 | 400 | 2.0405 | 2.0405 | -76.5483 | 1.6413 | -0.1759 | 0.9994 | 1.8173 | -9.1840 | -76.5483 | -2.6918 | -3.0033 |
1.9476 | 0.0685 | 500 | 2.0250 | 2.0250 | -75.9762 | 1.6985 | -0.1561 | 0.9991 | 1.8546 | -8.9858 | -75.9762 | -2.6981 | -3.0089 |
2.0151 | 0.0822 | 600 | 2.0134 | 2.0133 | -75.5465 | 1.7415 | -0.1979 | 0.9991 | 1.9394 | -9.4039 | -75.5465 | -2.6956 | -3.0066 |
1.9972 | 0.0960 | 700 | 2.0037 | 2.0037 | -75.1909 | 1.7770 | -0.2110 | 0.9997 | 1.9881 | -9.5345 | -75.1909 | -2.6886 | -2.9996 |
1.9851 | 0.1097 | 800 | 1.9950 | 1.9950 | -74.8615 | 1.8100 | -0.2127 | 0.9997 | 2.0226 | -9.5511 | -74.8615 | -2.6861 | -2.9971 |
2.0271 | 0.1234 | 900 | 1.9890 | 1.9890 | -74.6372 | 1.8324 | -0.2530 | 0.9995 | 2.0854 | -9.9543 | -74.6372 | -2.6818 | -2.9925 |
2.0501 | 0.1371 | 1000 | 1.9845 | 1.9845 | -74.4788 | 1.8483 | -0.3242 | 0.9997 | 2.1724 | -10.6661 | -74.4788 | -2.6491 | -2.9545 |
1.9699 | 0.1508 | 1100 | 1.9813 | 1.9812 | -74.3528 | 1.8609 | -0.3208 | 0.9997 | 2.1817 | -10.6327 | -74.3528 | -2.6664 | -2.9755 |
1.9448 | 0.1645 | 1200 | 1.9773 | 1.9772 | -74.2031 | 1.8758 | -0.2738 | 0.9997 | 2.1496 | -10.1623 | -74.2031 | -2.6739 | -2.9842 |
1.9606 | 0.1782 | 1300 | 1.9746 | 1.9746 | -74.0931 | 1.8868 | -0.3353 | 0.9997 | 2.2221 | -10.7775 | -74.0931 | -2.6755 | -2.9850 |
1.8795 | 0.1919 | 1400 | 1.9716 | 1.9715 | -73.9887 | 1.8973 | -0.3115 | 0.9997 | 2.2088 | -10.5398 | -73.9887 | -2.6658 | -2.9741 |
1.9585 | 0.2056 | 1500 | 1.9703 | 1.9703 | -73.9430 | 1.9018 | -0.3353 | 0.9997 | 2.2371 | -10.7774 | -73.9430 | -2.6721 | -2.9814 |
1.9508 | 0.2193 | 1600 | 1.9664 | 1.9664 | -73.7942 | 1.9167 | -0.4138 | 0.9998 | 2.3305 | -11.5624 | -73.7942 | -2.6751 | -2.9840 |
1.9041 | 0.2330 | 1700 | 1.9657 | 1.9656 | -73.7736 | 1.9188 | -0.3353 | 0.9997 | 2.2541 | -10.7776 | -73.7736 | -2.6703 | -2.9794 |
1.9507 | 0.2467 | 1800 | 1.9634 | 1.9634 | -73.6847 | 1.9277 | -0.3964 | 0.9998 | 2.3240 | -11.3880 | -73.6847 | -2.6728 | -2.9810 |
1.8942 | 0.2604 | 1900 | 1.9620 | 1.9620 | -73.6314 | 1.9330 | -0.3368 | 0.9998 | 2.2698 | -10.7926 | -73.6314 | -2.6631 | -2.9695 |
2.0088 | 0.2742 | 2000 | 1.9604 | 1.9603 | -73.5703 | 1.9391 | -0.3303 | 0.9997 | 2.2694 | -10.7277 | -73.5703 | -2.6651 | -2.9720 |
2.0277 | 0.2879 | 2100 | 1.9596 | 1.9596 | -73.5404 | 1.9421 | -0.3122 | 0.9997 | 2.2543 | -10.5463 | -73.5404 | -2.6687 | -2.9765 |
1.9697 | 0.3016 | 2200 | 1.9578 | 1.9578 | -73.4823 | 1.9479 | -0.3187 | 0.9998 | 2.2666 | -10.6117 | -73.4823 | -2.6615 | -2.9674 |
1.9756 | 0.3153 | 2300 | 1.9564 | 1.9564 | -73.4282 | 1.9533 | -0.3217 | 0.9997 | 2.2750 | -10.6410 | -73.4282 | -2.6624 | -2.9692 |
1.9471 | 0.3290 | 2400 | 1.9552 | 1.9551 | -73.3780 | 1.9583 | -0.3660 | 0.9997 | 2.3244 | -11.0849 | -73.3780 | -2.6636 | -2.9703 |
1.9646 | 0.3427 | 2500 | 1.9546 | 1.9546 | -73.3608 | 1.9601 | -0.3453 | 0.9997 | 2.3054 | -10.8779 | -73.3608 | -2.6522 | -2.9582 |
2.0034 | 0.3564 | 2600 | 1.9536 | 1.9536 | -73.3221 | 1.9639 | -0.4025 | 0.9998 | 2.3665 | -11.4498 | -73.3221 | -2.6635 | -2.9708 |
1.9853 | 0.3701 | 2700 | 1.9522 | 1.9522 | -73.2647 | 1.9697 | -0.3826 | 0.9998 | 2.3523 | -11.2507 | -73.2647 | -2.6548 | -2.9612 |
1.9648 | 0.3838 | 2800 | 1.9518 | 1.9518 | -73.2540 | 1.9707 | -0.4008 | 0.9998 | 2.3716 | -11.4329 | -73.2540 | -2.6557 | -2.9618 |
1.992 | 0.3975 | 2900 | 1.9514 | 1.9513 | -73.2347 | 1.9727 | -0.3741 | 0.9998 | 2.3468 | -11.1657 | -73.2347 | -2.6585 | -2.9649 |
1.9098 | 0.4112 | 3000 | 1.9501 | 1.9501 | -73.1879 | 1.9773 | -0.3653 | 0.9998 | 2.3426 | -11.0774 | -73.1879 | -2.6623 | -2.9691 |
2.0089 | 0.4249 | 3100 | 1.9496 | 1.9496 | -73.1694 | 1.9792 | -0.3960 | 0.9998 | 2.3752 | -11.3848 | -73.1694 | -2.6570 | -2.9627 |
2.0138 | 0.4386 | 3200 | 1.9487 | 1.9487 | -73.1364 | 1.9825 | -0.3799 | 0.9998 | 2.3624 | -11.2233 | -73.1364 | -2.6524 | -2.9576 |
1.9295 | 0.4524 | 3300 | 1.9489 | 1.9489 | -73.1488 | 1.9813 | -0.3977 | 0.9998 | 2.3790 | -11.4018 | -73.1488 | -2.6569 | -2.9628 |
1.9276 | 0.4661 | 3400 | 1.9479 | 1.9479 | -73.1079 | 1.9853 | -0.3945 | 0.9998 | 2.3799 | -11.3697 | -73.1079 | -2.6537 | -2.9590 |
1.9594 | 0.4798 | 3500 | 1.9472 | 1.9472 | -73.0821 | 1.9879 | -0.4255 | 0.9998 | 2.4135 | -11.6798 | -73.0821 | -2.6542 | -2.9600 |
1.9141 | 0.4935 | 3600 | 1.9471 | 1.9471 | -73.0800 | 1.9881 | -0.4024 | 0.9998 | 2.3906 | -11.4487 | -73.0800 | -2.6500 | -2.9555 |
1.8611 | 0.5072 | 3700 | 1.9460 | 1.9460 | -73.0338 | 1.9928 | -0.3865 | 0.9998 | 2.3793 | -11.2897 | -73.0338 | -2.6542 | -2.9599 |
1.8907 | 0.5209 | 3800 | 1.9460 | 1.9460 | -73.0372 | 1.9924 | -0.3918 | 0.9998 | 2.3843 | -11.3429 | -73.0372 | -2.6504 | -2.9556 |
1.9147 | 0.5346 | 3900 | 1.9456 | 1.9456 | -73.0218 | 1.9940 | -0.3939 | 0.9998 | 2.3879 | -11.3637 | -73.0218 | -2.6498 | -2.9550 |
1.9485 | 0.5483 | 4000 | 1.9454 | 1.9454 | -73.0146 | 1.9947 | -0.4036 | 0.9998 | 2.3983 | -11.4605 | -73.0146 | -2.6513 | -2.9565 |
1.9379 | 0.5620 | 4100 | 1.9448 | 1.9448 | -72.9908 | 1.9971 | -0.3932 | 0.9998 | 2.3902 | -11.3561 | -72.9908 | -2.6501 | -2.9550 |
1.8956 | 0.5757 | 4200 | 1.9444 | 1.9443 | -72.9738 | 1.9988 | -0.4097 | 0.9998 | 2.4084 | -11.5214 | -72.9738 | -2.6477 | -2.9518 |
1.9916 | 0.5894 | 4300 | 1.9440 | 1.9440 | -72.9580 | 2.0003 | -0.4049 | 0.9998 | 2.4053 | -11.4737 | -72.9580 | -2.6473 | -2.9514 |
1.8885 | 0.6031 | 4400 | 1.9441 | 1.9441 | -72.9673 | 1.9994 | -0.3808 | 0.9998 | 2.3802 | -11.2320 | -72.9673 | -2.6464 | -2.9503 |
1.9078 | 0.6169 | 4500 | 1.9437 | 1.9436 | -72.9481 | 2.0013 | -0.4206 | 0.9998 | 2.4220 | -11.6308 | -72.9481 | -2.6465 | -2.9503 |
1.9037 | 0.6306 | 4600 | 1.9435 | 1.9434 | -72.9426 | 2.0019 | -0.3718 | 0.9998 | 2.3737 | -11.1427 | -72.9426 | -2.6441 | -2.9481 |
1.9558 | 0.6443 | 4700 | 1.9427 | 1.9427 | -72.9121 | 2.0049 | -0.3758 | 0.9998 | 2.3807 | -11.1827 | -72.9121 | -2.6445 | -2.9484 |
1.9416 | 0.6580 | 4800 | 1.9429 | 1.9428 | -72.9187 | 2.0043 | -0.3698 | 0.9998 | 2.3741 | -11.1227 | -72.9187 | -2.6447 | -2.9486 |
1.9471 | 0.6717 | 4900 | 1.9427 | 1.9427 | -72.9167 | 2.0045 | -0.4041 | 0.9998 | 2.4085 | -11.4650 | -72.9167 | -2.6447 | -2.9486 |
1.9237 | 0.6854 | 5000 | 1.9425 | 1.9425 | -72.9062 | 2.0055 | -0.4023 | 0.9998 | 2.4079 | -11.4479 | -72.9062 | -2.6451 | -2.9490 |
1.9687 | 0.6991 | 5100 | 1.9422 | 1.9421 | -72.8930 | 2.0068 | -0.4106 | 0.9998 | 2.4174 | -11.5306 | -72.8930 | -2.6475 | -2.9516 |
1.9274 | 0.7128 | 5200 | 1.9420 | 1.9420 | -72.8846 | 2.0077 | -0.3934 | 0.9998 | 2.4011 | -11.3589 | -72.8846 | -2.6454 | -2.9492 |
1.8258 | 0.7265 | 5300 | 1.9418 | 1.9418 | -72.8788 | 2.0083 | -0.3905 | 0.9998 | 2.3987 | -11.3293 | -72.8788 | -2.6458 | -2.9498 |
1.8978 | 0.7402 | 5400 | 1.9416 | 1.9416 | -72.8710 | 2.0090 | -0.4199 | 0.9998 | 2.4289 | -11.6232 | -72.8710 | -2.6475 | -2.9515 |
1.9706 | 0.7539 | 5500 | 1.9416 | 1.9416 | -72.8733 | 2.0088 | -0.4296 | 0.9998 | 2.4384 | -11.7202 | -72.8733 | -2.6467 | -2.9506 |
1.8711 | 0.7676 | 5600 | 1.9416 | 1.9415 | -72.8708 | 2.0091 | -0.4093 | 0.9998 | 2.4183 | -11.5174 | -72.8708 | -2.6454 | -2.9492 |
1.925 | 0.7813 | 5700 | 1.9412 | 1.9411 | -72.8550 | 2.0106 | -0.4237 | 0.9998 | 2.4344 | -11.6619 | -72.8550 | -2.6463 | -2.9502 |
1.952 | 0.7951 | 5800 | 1.9412 | 1.9411 | -72.8554 | 2.0106 | -0.4179 | 0.9998 | 2.4285 | -11.6032 | -72.8554 | -2.6463 | -2.9503 |
1.9295 | 0.8088 | 5900 | 1.9413 | 1.9413 | -72.8621 | 2.0099 | -0.4133 | 0.9998 | 2.4233 | -11.5578 | -72.8621 | -2.6463 | -2.9503 |
1.9457 | 0.8225 | 6000 | 1.9413 | 1.9413 | -72.8636 | 2.0098 | -0.4083 | 0.9998 | 2.4180 | -11.5072 | -72.8636 | -2.6459 | -2.9499 |
1.9016 | 0.8362 | 6100 | 1.9412 | 1.9412 | -72.8592 | 2.0102 | -0.4150 | 0.9998 | 2.4252 | -11.5748 | -72.8592 | -2.6471 | -2.9513 |
1.9789 | 0.8499 | 6200 | 1.9413 | 1.9413 | -72.8632 | 2.0098 | -0.4221 | 0.9998 | 2.4319 | -11.6458 | -72.8632 | -2.6477 | -2.9520 |
1.944 | 0.8636 | 6300 | 1.9411 | 1.9411 | -72.8542 | 2.0107 | -0.4232 | 0.9998 | 2.4339 | -11.6568 | -72.8542 | -2.6475 | -2.9518 |
1.9435 | 0.8773 | 6400 | 1.9410 | 1.9409 | -72.8496 | 2.0112 | -0.4278 | 0.9998 | 2.4390 | -11.7027 | -72.8496 | -2.6479 | -2.9523 |
1.917 | 0.8910 | 6500 | 1.9410 | 1.9410 | -72.8519 | 2.0109 | -0.4237 | 0.9998 | 2.4346 | -11.6610 | -72.8519 | -2.6482 | -2.9525 |
1.9243 | 0.9047 | 6600 | 1.9410 | 1.9410 | -72.8520 | 2.0109 | -0.4202 | 0.9998 | 2.4311 | -11.6265 | -72.8520 | -2.6480 | -2.9523 |
1.8624 | 0.9184 | 6700 | 1.9409 | 1.9409 | -72.8485 | 2.0113 | -0.4202 | 0.9998 | 2.4314 | -11.6260 | -72.8485 | -2.6477 | -2.9520 |
1.8998 | 0.9321 | 6800 | 1.9410 | 1.9409 | -72.8489 | 2.0112 | -0.4227 | 0.9998 | 2.4340 | -11.6518 | -72.8489 | -2.6478 | -2.9521 |
1.9654 | 0.9458 | 6900 | 1.9410 | 1.9409 | -72.8490 | 2.0112 | -0.4228 | 0.9998 | 2.4341 | -11.6529 | -72.8490 | -2.6478 | -2.9521 |
1.9113 | 0.9595 | 7000 | 1.9409 | 1.9409 | -72.8471 | 2.0114 | -0.4228 | 0.9998 | 2.4342 | -11.6520 | -72.8471 | -2.6477 | -2.9520 |
1.951 | 0.9733 | 7100 | 1.9410 | 1.9410 | -72.8501 | 2.0111 | -0.4228 | 0.9998 | 2.4339 | -11.6524 | -72.8501 | -2.6478 | -2.9521 |
1.9863 | 0.9870 | 7200 | 1.9409 | 1.9409 | -72.8478 | 2.0114 | -0.4229 | 0.9998 | 2.4343 | -11.6536 | -72.8478 | -2.6479 | -2.9522 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.