tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:
- Loss: 0.6803
- Rewards/chosen: -0.1265
- Rewards/rejected: -0.1560
- Rewards/accuracies: 0.6036
- Rewards/margins: 0.0295
- Logps/rejected: -78.7771
- Logps/chosen: -71.3634
- Logits/rejected: -2.9512
- Logits/chosen: -2.9570
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-08
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6931 | 0.0689 | 100 | 0.6932 | -0.0001 | 0.0001 | 0.4793 | -0.0001 | -63.1744 | -58.7172 | -3.1574 | -3.1630 |
0.6932 | 0.1378 | 200 | 0.6931 | 0.0001 | 0.0001 | 0.4956 | 0.0000 | -63.1716 | -58.7029 | -3.1576 | -3.1633 |
0.693 | 0.2068 | 300 | 0.6932 | 0.0001 | 0.0002 | 0.4724 | -0.0001 | -63.1577 | -58.7002 | -3.1575 | -3.1632 |
0.693 | 0.2757 | 400 | 0.6931 | 0.0003 | 0.0003 | 0.5007 | 0.0000 | -63.1547 | -58.6827 | -3.1569 | -3.1625 |
0.6927 | 0.3446 | 500 | 0.6931 | 0.0006 | 0.0004 | 0.5128 | 0.0002 | -63.1359 | -58.6518 | -3.1563 | -3.1619 |
0.6922 | 0.4135 | 600 | 0.6930 | 0.0009 | 0.0005 | 0.5358 | 0.0004 | -63.1295 | -58.6249 | -3.1544 | -3.1600 |
0.692 | 0.4824 | 700 | 0.6928 | 0.0015 | 0.0008 | 0.5516 | 0.0007 | -63.0973 | -58.5609 | -3.1522 | -3.1578 |
0.6911 | 0.5513 | 800 | 0.6926 | 0.0018 | 0.0006 | 0.5634 | 0.0012 | -63.1172 | -58.5317 | -3.1497 | -3.1553 |
0.6903 | 0.6203 | 900 | 0.6923 | 0.0019 | 0.0002 | 0.5641 | 0.0017 | -63.1634 | -58.5242 | -3.1456 | -3.1513 |
0.6899 | 0.6892 | 1000 | 0.6920 | 0.0016 | -0.0008 | 0.5676 | 0.0024 | -63.2556 | -58.5502 | -3.1411 | -3.1467 |
0.6898 | 0.7581 | 1100 | 0.6916 | 0.0011 | -0.0021 | 0.5802 | 0.0032 | -63.3925 | -58.6040 | -3.1359 | -3.1415 |
0.689 | 0.8270 | 1200 | 0.6913 | 0.0000 | -0.0038 | 0.5753 | 0.0038 | -63.5565 | -58.7099 | -3.1316 | -3.1371 |
0.6881 | 0.8959 | 1300 | 0.6910 | -0.0015 | -0.0061 | 0.5804 | 0.0046 | -63.7902 | -58.8624 | -3.1268 | -3.1325 |
0.6874 | 0.9649 | 1400 | 0.6907 | -0.0037 | -0.0088 | 0.5825 | 0.0051 | -64.0628 | -59.0799 | -3.1213 | -3.1269 |
0.6867 | 1.0338 | 1500 | 0.6903 | -0.0063 | -0.0124 | 0.5843 | 0.0061 | -64.4169 | -59.3381 | -3.1142 | -3.1198 |
0.6857 | 1.1027 | 1600 | 0.6899 | -0.0097 | -0.0166 | 0.5876 | 0.0069 | -64.8429 | -59.6860 | -3.1081 | -3.1137 |
0.6843 | 1.1716 | 1700 | 0.6895 | -0.0148 | -0.0227 | 0.5804 | 0.0078 | -65.4468 | -60.1953 | -3.1013 | -3.1070 |
0.6842 | 1.2405 | 1800 | 0.6890 | -0.0219 | -0.0309 | 0.5871 | 0.0089 | -66.2668 | -60.9047 | -3.0944 | -3.1001 |
0.6802 | 1.3094 | 1900 | 0.6886 | -0.0263 | -0.0362 | 0.5920 | 0.0098 | -66.7954 | -61.3438 | -3.0883 | -3.0940 |
0.6824 | 1.3784 | 2000 | 0.6881 | -0.0324 | -0.0436 | 0.5939 | 0.0112 | -67.5355 | -61.9519 | -3.0814 | -3.0871 |
0.6799 | 1.4473 | 2100 | 0.6875 | -0.0387 | -0.0510 | 0.5992 | 0.0123 | -68.2835 | -62.5824 | -3.0754 | -3.0811 |
0.6793 | 1.5162 | 2200 | 0.6872 | -0.0420 | -0.0551 | 0.5913 | 0.0131 | -68.6940 | -62.9161 | -3.0698 | -3.0755 |
0.6797 | 1.5851 | 2300 | 0.6868 | -0.0485 | -0.0626 | 0.5918 | 0.0141 | -69.4427 | -63.5627 | -3.0623 | -3.0680 |
0.6792 | 1.6540 | 2400 | 0.6863 | -0.0512 | -0.0663 | 0.5939 | 0.0151 | -69.8102 | -63.8365 | -3.0547 | -3.0604 |
0.6775 | 1.7229 | 2500 | 0.6860 | -0.0552 | -0.0710 | 0.5946 | 0.0158 | -70.2800 | -64.2325 | -3.0488 | -3.0546 |
0.6768 | 1.7919 | 2600 | 0.6856 | -0.0598 | -0.0766 | 0.5936 | 0.0169 | -70.8443 | -64.6883 | -3.0412 | -3.0469 |
0.675 | 1.8608 | 2700 | 0.6851 | -0.0654 | -0.0832 | 0.5948 | 0.0178 | -71.4996 | -65.2471 | -3.0345 | -3.0402 |
0.6736 | 1.9297 | 2800 | 0.6847 | -0.0707 | -0.0896 | 0.5983 | 0.0189 | -72.1448 | -65.7864 | -3.0286 | -3.0344 |
0.6773 | 1.9986 | 2900 | 0.6844 | -0.0746 | -0.0943 | 0.6020 | 0.0196 | -72.6052 | -66.1758 | -3.0225 | -3.0283 |
0.6724 | 2.0675 | 3000 | 0.6841 | -0.0793 | -0.0997 | 0.6029 | 0.0204 | -73.1465 | -66.6415 | -3.0158 | -3.0216 |
0.674 | 2.1365 | 3100 | 0.6837 | -0.0824 | -0.1036 | 0.6029 | 0.0212 | -73.5381 | -66.9540 | -3.0112 | -3.0169 |
0.6764 | 2.2054 | 3200 | 0.6834 | -0.0857 | -0.1076 | 0.6066 | 0.0219 | -73.9390 | -67.2856 | -3.0047 | -3.0105 |
0.6749 | 2.2743 | 3300 | 0.6831 | -0.0887 | -0.1113 | 0.6069 | 0.0226 | -74.3103 | -67.5846 | -2.9991 | -3.0049 |
0.6746 | 2.3432 | 3400 | 0.6828 | -0.0921 | -0.1154 | 0.6055 | 0.0233 | -74.7230 | -67.9247 | -2.9944 | -3.0002 |
0.6718 | 2.4121 | 3500 | 0.6824 | -0.0962 | -0.1204 | 0.6069 | 0.0242 | -75.2213 | -68.3350 | -2.9890 | -2.9948 |
0.672 | 2.4810 | 3600 | 0.6822 | -0.1013 | -0.1261 | 0.6048 | 0.0248 | -75.7936 | -68.8439 | -2.9844 | -2.9902 |
0.6733 | 2.5500 | 3700 | 0.6820 | -0.1048 | -0.1302 | 0.6032 | 0.0254 | -76.1958 | -69.1902 | -2.9800 | -2.9858 |
0.6715 | 2.6189 | 3800 | 0.6817 | -0.1077 | -0.1336 | 0.6046 | 0.0260 | -76.5409 | -69.4776 | -2.9765 | -2.9823 |
0.6709 | 2.6878 | 3900 | 0.6816 | -0.1102 | -0.1366 | 0.6020 | 0.0264 | -76.8374 | -69.7330 | -2.9729 | -2.9787 |
0.6696 | 2.7567 | 4000 | 0.6814 | -0.1132 | -0.1400 | 0.6032 | 0.0268 | -77.1831 | -70.0346 | -2.9698 | -2.9756 |
0.6687 | 2.8256 | 4100 | 0.6812 | -0.1154 | -0.1427 | 0.6048 | 0.0273 | -77.4501 | -70.2526 | -2.9670 | -2.9729 |
0.6692 | 2.8946 | 4200 | 0.6810 | -0.1166 | -0.1443 | 0.6073 | 0.0277 | -77.6081 | -70.3715 | -2.9649 | -2.9708 |
0.6742 | 2.9635 | 4300 | 0.6809 | -0.1184 | -0.1463 | 0.6027 | 0.0279 | -77.8100 | -70.5513 | -2.9629 | -2.9687 |
0.6652 | 3.0324 | 4400 | 0.6808 | -0.1191 | -0.1473 | 0.6090 | 0.0282 | -77.9141 | -70.6218 | -2.9606 | -2.9664 |
0.6659 | 3.1013 | 4500 | 0.6807 | -0.1206 | -0.1490 | 0.6046 | 0.0284 | -78.0785 | -70.7742 | -2.9587 | -2.9645 |
0.666 | 3.1702 | 4600 | 0.6805 | -0.1225 | -0.1512 | 0.6062 | 0.0288 | -78.3027 | -70.9582 | -2.9569 | -2.9628 |
0.6644 | 3.2391 | 4700 | 0.6805 | -0.1237 | -0.1527 | 0.6059 | 0.0290 | -78.4454 | -71.0785 | -2.9557 | -2.9615 |
0.6685 | 3.3081 | 4800 | 0.6804 | -0.1246 | -0.1536 | 0.6053 | 0.0291 | -78.5441 | -71.1674 | -2.9547 | -2.9605 |
0.6651 | 3.3770 | 4900 | 0.6803 | -0.1250 | -0.1542 | 0.6039 | 0.0293 | -78.6030 | -71.2072 | -2.9539 | -2.9598 |
0.6689 | 3.4459 | 5000 | 0.6803 | -0.1254 | -0.1547 | 0.6062 | 0.0293 | -78.6476 | -71.2503 | -2.9530 | -2.9588 |
0.6653 | 3.5148 | 5100 | 0.6802 | -0.1256 | -0.1552 | 0.6050 | 0.0296 | -78.6955 | -71.2721 | -2.9525 | -2.9583 |
0.6664 | 3.5837 | 5200 | 0.6803 | -0.1261 | -0.1556 | 0.6046 | 0.0295 | -78.7380 | -71.3226 | -2.9519 | -2.9577 |
0.6687 | 3.6527 | 5300 | 0.6803 | -0.1265 | -0.1559 | 0.6064 | 0.0294 | -78.7701 | -71.3572 | -2.9516 | -2.9574 |
0.6641 | 3.7216 | 5400 | 0.6803 | -0.1266 | -0.1560 | 0.6059 | 0.0294 | -78.7822 | -71.3690 | -2.9514 | -2.9573 |
0.6637 | 3.7905 | 5500 | 0.6803 | -0.1265 | -0.1559 | 0.6053 | 0.0295 | -78.7736 | -71.3579 | -2.9516 | -2.9575 |
0.6694 | 3.8594 | 5600 | 0.6802 | -0.1265 | -0.1561 | 0.6036 | 0.0296 | -78.7869 | -71.3611 | -2.9515 | -2.9574 |
0.6684 | 3.9283 | 5700 | 0.6803 | -0.1266 | -0.1560 | 0.6071 | 0.0294 | -78.7792 | -71.3707 | -2.9512 | -2.9571 |
0.6668 | 3.9972 | 5800 | 0.6803 | -0.1265 | -0.1560 | 0.6036 | 0.0295 | -78.7771 | -71.3634 | -2.9512 | -2.9570 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.