Edit model card

sft-sum-chosen-10lp-shuff-full-tiny

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T on the martimfasantos/openai-summarize-tldr dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9409
  • Nll Loss: 1.9409
  • Logps/best: -72.8478
  • Rewards/chosen: 2.0114
  • Rewards/rejected: -0.4229
  • Rewards/accuracies: 0.9998
  • Rewards/margins: 2.4343
  • Logps/rejected: -11.6536
  • Logps/chosen: -72.8478
  • Logits/rejected: -2.6479
  • Logits/chosen: -2.9522

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Nll Loss Logps/best Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
2.3573 0.0137 100 2.3703 2.3703 -88.8140 0.4147 0.0412 1.0 0.3735 -7.0125 -88.8140 -2.6551 -2.9658
2.1904 0.0274 200 2.1322 2.1322 -79.9647 1.2997 0.0373 1.0 1.2624 -7.0516 -79.9647 -2.6656 -2.9758
1.9956 0.0411 300 2.0629 2.0629 -77.3844 1.5577 -0.1097 0.9995 1.6674 -8.5217 -77.3844 -2.6813 -2.9915
2.0379 0.0548 400 2.0405 2.0405 -76.5483 1.6413 -0.1759 0.9994 1.8173 -9.1840 -76.5483 -2.6918 -3.0033
1.9476 0.0685 500 2.0250 2.0250 -75.9762 1.6985 -0.1561 0.9991 1.8546 -8.9858 -75.9762 -2.6981 -3.0089
2.0151 0.0822 600 2.0134 2.0133 -75.5465 1.7415 -0.1979 0.9991 1.9394 -9.4039 -75.5465 -2.6956 -3.0066
1.9972 0.0960 700 2.0037 2.0037 -75.1909 1.7770 -0.2110 0.9997 1.9881 -9.5345 -75.1909 -2.6886 -2.9996
1.9851 0.1097 800 1.9950 1.9950 -74.8615 1.8100 -0.2127 0.9997 2.0226 -9.5511 -74.8615 -2.6861 -2.9971
2.0271 0.1234 900 1.9890 1.9890 -74.6372 1.8324 -0.2530 0.9995 2.0854 -9.9543 -74.6372 -2.6818 -2.9925
2.0501 0.1371 1000 1.9845 1.9845 -74.4788 1.8483 -0.3242 0.9997 2.1724 -10.6661 -74.4788 -2.6491 -2.9545
1.9699 0.1508 1100 1.9813 1.9812 -74.3528 1.8609 -0.3208 0.9997 2.1817 -10.6327 -74.3528 -2.6664 -2.9755
1.9448 0.1645 1200 1.9773 1.9772 -74.2031 1.8758 -0.2738 0.9997 2.1496 -10.1623 -74.2031 -2.6739 -2.9842
1.9606 0.1782 1300 1.9746 1.9746 -74.0931 1.8868 -0.3353 0.9997 2.2221 -10.7775 -74.0931 -2.6755 -2.9850
1.8795 0.1919 1400 1.9716 1.9715 -73.9887 1.8973 -0.3115 0.9997 2.2088 -10.5398 -73.9887 -2.6658 -2.9741
1.9585 0.2056 1500 1.9703 1.9703 -73.9430 1.9018 -0.3353 0.9997 2.2371 -10.7774 -73.9430 -2.6721 -2.9814
1.9508 0.2193 1600 1.9664 1.9664 -73.7942 1.9167 -0.4138 0.9998 2.3305 -11.5624 -73.7942 -2.6751 -2.9840
1.9041 0.2330 1700 1.9657 1.9656 -73.7736 1.9188 -0.3353 0.9997 2.2541 -10.7776 -73.7736 -2.6703 -2.9794
1.9507 0.2467 1800 1.9634 1.9634 -73.6847 1.9277 -0.3964 0.9998 2.3240 -11.3880 -73.6847 -2.6728 -2.9810
1.8942 0.2604 1900 1.9620 1.9620 -73.6314 1.9330 -0.3368 0.9998 2.2698 -10.7926 -73.6314 -2.6631 -2.9695
2.0088 0.2742 2000 1.9604 1.9603 -73.5703 1.9391 -0.3303 0.9997 2.2694 -10.7277 -73.5703 -2.6651 -2.9720
2.0277 0.2879 2100 1.9596 1.9596 -73.5404 1.9421 -0.3122 0.9997 2.2543 -10.5463 -73.5404 -2.6687 -2.9765
1.9697 0.3016 2200 1.9578 1.9578 -73.4823 1.9479 -0.3187 0.9998 2.2666 -10.6117 -73.4823 -2.6615 -2.9674
1.9756 0.3153 2300 1.9564 1.9564 -73.4282 1.9533 -0.3217 0.9997 2.2750 -10.6410 -73.4282 -2.6624 -2.9692
1.9471 0.3290 2400 1.9552 1.9551 -73.3780 1.9583 -0.3660 0.9997 2.3244 -11.0849 -73.3780 -2.6636 -2.9703
1.9646 0.3427 2500 1.9546 1.9546 -73.3608 1.9601 -0.3453 0.9997 2.3054 -10.8779 -73.3608 -2.6522 -2.9582
2.0034 0.3564 2600 1.9536 1.9536 -73.3221 1.9639 -0.4025 0.9998 2.3665 -11.4498 -73.3221 -2.6635 -2.9708
1.9853 0.3701 2700 1.9522 1.9522 -73.2647 1.9697 -0.3826 0.9998 2.3523 -11.2507 -73.2647 -2.6548 -2.9612
1.9648 0.3838 2800 1.9518 1.9518 -73.2540 1.9707 -0.4008 0.9998 2.3716 -11.4329 -73.2540 -2.6557 -2.9618
1.992 0.3975 2900 1.9514 1.9513 -73.2347 1.9727 -0.3741 0.9998 2.3468 -11.1657 -73.2347 -2.6585 -2.9649
1.9098 0.4112 3000 1.9501 1.9501 -73.1879 1.9773 -0.3653 0.9998 2.3426 -11.0774 -73.1879 -2.6623 -2.9691
2.0089 0.4249 3100 1.9496 1.9496 -73.1694 1.9792 -0.3960 0.9998 2.3752 -11.3848 -73.1694 -2.6570 -2.9627
2.0138 0.4386 3200 1.9487 1.9487 -73.1364 1.9825 -0.3799 0.9998 2.3624 -11.2233 -73.1364 -2.6524 -2.9576
1.9295 0.4524 3300 1.9489 1.9489 -73.1488 1.9813 -0.3977 0.9998 2.3790 -11.4018 -73.1488 -2.6569 -2.9628
1.9276 0.4661 3400 1.9479 1.9479 -73.1079 1.9853 -0.3945 0.9998 2.3799 -11.3697 -73.1079 -2.6537 -2.9590
1.9594 0.4798 3500 1.9472 1.9472 -73.0821 1.9879 -0.4255 0.9998 2.4135 -11.6798 -73.0821 -2.6542 -2.9600
1.9141 0.4935 3600 1.9471 1.9471 -73.0800 1.9881 -0.4024 0.9998 2.3906 -11.4487 -73.0800 -2.6500 -2.9555
1.8611 0.5072 3700 1.9460 1.9460 -73.0338 1.9928 -0.3865 0.9998 2.3793 -11.2897 -73.0338 -2.6542 -2.9599
1.8907 0.5209 3800 1.9460 1.9460 -73.0372 1.9924 -0.3918 0.9998 2.3843 -11.3429 -73.0372 -2.6504 -2.9556
1.9147 0.5346 3900 1.9456 1.9456 -73.0218 1.9940 -0.3939 0.9998 2.3879 -11.3637 -73.0218 -2.6498 -2.9550
1.9485 0.5483 4000 1.9454 1.9454 -73.0146 1.9947 -0.4036 0.9998 2.3983 -11.4605 -73.0146 -2.6513 -2.9565
1.9379 0.5620 4100 1.9448 1.9448 -72.9908 1.9971 -0.3932 0.9998 2.3902 -11.3561 -72.9908 -2.6501 -2.9550
1.8956 0.5757 4200 1.9444 1.9443 -72.9738 1.9988 -0.4097 0.9998 2.4084 -11.5214 -72.9738 -2.6477 -2.9518
1.9916 0.5894 4300 1.9440 1.9440 -72.9580 2.0003 -0.4049 0.9998 2.4053 -11.4737 -72.9580 -2.6473 -2.9514
1.8885 0.6031 4400 1.9441 1.9441 -72.9673 1.9994 -0.3808 0.9998 2.3802 -11.2320 -72.9673 -2.6464 -2.9503
1.9078 0.6169 4500 1.9437 1.9436 -72.9481 2.0013 -0.4206 0.9998 2.4220 -11.6308 -72.9481 -2.6465 -2.9503
1.9037 0.6306 4600 1.9435 1.9434 -72.9426 2.0019 -0.3718 0.9998 2.3737 -11.1427 -72.9426 -2.6441 -2.9481
1.9558 0.6443 4700 1.9427 1.9427 -72.9121 2.0049 -0.3758 0.9998 2.3807 -11.1827 -72.9121 -2.6445 -2.9484
1.9416 0.6580 4800 1.9429 1.9428 -72.9187 2.0043 -0.3698 0.9998 2.3741 -11.1227 -72.9187 -2.6447 -2.9486
1.9471 0.6717 4900 1.9427 1.9427 -72.9167 2.0045 -0.4041 0.9998 2.4085 -11.4650 -72.9167 -2.6447 -2.9486
1.9237 0.6854 5000 1.9425 1.9425 -72.9062 2.0055 -0.4023 0.9998 2.4079 -11.4479 -72.9062 -2.6451 -2.9490
1.9687 0.6991 5100 1.9422 1.9421 -72.8930 2.0068 -0.4106 0.9998 2.4174 -11.5306 -72.8930 -2.6475 -2.9516
1.9274 0.7128 5200 1.9420 1.9420 -72.8846 2.0077 -0.3934 0.9998 2.4011 -11.3589 -72.8846 -2.6454 -2.9492
1.8258 0.7265 5300 1.9418 1.9418 -72.8788 2.0083 -0.3905 0.9998 2.3987 -11.3293 -72.8788 -2.6458 -2.9498
1.8978 0.7402 5400 1.9416 1.9416 -72.8710 2.0090 -0.4199 0.9998 2.4289 -11.6232 -72.8710 -2.6475 -2.9515
1.9706 0.7539 5500 1.9416 1.9416 -72.8733 2.0088 -0.4296 0.9998 2.4384 -11.7202 -72.8733 -2.6467 -2.9506
1.8711 0.7676 5600 1.9416 1.9415 -72.8708 2.0091 -0.4093 0.9998 2.4183 -11.5174 -72.8708 -2.6454 -2.9492
1.925 0.7813 5700 1.9412 1.9411 -72.8550 2.0106 -0.4237 0.9998 2.4344 -11.6619 -72.8550 -2.6463 -2.9502
1.952 0.7951 5800 1.9412 1.9411 -72.8554 2.0106 -0.4179 0.9998 2.4285 -11.6032 -72.8554 -2.6463 -2.9503
1.9295 0.8088 5900 1.9413 1.9413 -72.8621 2.0099 -0.4133 0.9998 2.4233 -11.5578 -72.8621 -2.6463 -2.9503
1.9457 0.8225 6000 1.9413 1.9413 -72.8636 2.0098 -0.4083 0.9998 2.4180 -11.5072 -72.8636 -2.6459 -2.9499
1.9016 0.8362 6100 1.9412 1.9412 -72.8592 2.0102 -0.4150 0.9998 2.4252 -11.5748 -72.8592 -2.6471 -2.9513
1.9789 0.8499 6200 1.9413 1.9413 -72.8632 2.0098 -0.4221 0.9998 2.4319 -11.6458 -72.8632 -2.6477 -2.9520
1.944 0.8636 6300 1.9411 1.9411 -72.8542 2.0107 -0.4232 0.9998 2.4339 -11.6568 -72.8542 -2.6475 -2.9518
1.9435 0.8773 6400 1.9410 1.9409 -72.8496 2.0112 -0.4278 0.9998 2.4390 -11.7027 -72.8496 -2.6479 -2.9523
1.917 0.8910 6500 1.9410 1.9410 -72.8519 2.0109 -0.4237 0.9998 2.4346 -11.6610 -72.8519 -2.6482 -2.9525
1.9243 0.9047 6600 1.9410 1.9410 -72.8520 2.0109 -0.4202 0.9998 2.4311 -11.6265 -72.8520 -2.6480 -2.9523
1.8624 0.9184 6700 1.9409 1.9409 -72.8485 2.0113 -0.4202 0.9998 2.4314 -11.6260 -72.8485 -2.6477 -2.9520
1.8998 0.9321 6800 1.9410 1.9409 -72.8489 2.0112 -0.4227 0.9998 2.4340 -11.6518 -72.8489 -2.6478 -2.9521
1.9654 0.9458 6900 1.9410 1.9409 -72.8490 2.0112 -0.4228 0.9998 2.4341 -11.6529 -72.8490 -2.6478 -2.9521
1.9113 0.9595 7000 1.9409 1.9409 -72.8471 2.0114 -0.4228 0.9998 2.4342 -11.6520 -72.8471 -2.6477 -2.9520
1.951 0.9733 7100 1.9410 1.9410 -72.8501 2.0111 -0.4228 0.9998 2.4339 -11.6524 -72.8501 -2.6478 -2.9521
1.9863 0.9870 7200 1.9409 1.9409 -72.8478 2.0114 -0.4229 0.9998 2.4343 -11.6536 -72.8478 -2.6479 -2.9522

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
16
Safetensors
Model size
1.1B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for martimfasantos/sft-sum-chosen-10lp-shuff-full-tiny

Finetuned
(89)
this model