tinyllama-1.1b-sum-simpo_beta2.0_gamma1.6_LR5e-8_3epochs
This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:
- Loss: 1.5991
- Rewards/chosen: -4.1926
- Rewards/rejected: -4.6137
- Rewards/accuracies: 0.6231
- Rewards/margins: 0.4211
- Logps/rejected: -2.3069
- Logps/chosen: -2.0963
- Logits/rejected: -3.3338
- Logits/chosen: -3.3372
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-08
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
1.6609 | 0.0172 | 100 | 1.6731 | -2.9616 | -3.2164 | 0.5960 | 0.2548 | -1.6082 | -1.4808 | -3.6346 | -3.6387 |
1.6971 | 0.0345 | 200 | 1.6731 | -2.9618 | -3.2164 | 0.5953 | 0.2546 | -1.6082 | -1.4809 | -3.6367 | -3.6408 |
1.6134 | 0.0517 | 300 | 1.6730 | -2.9617 | -3.2164 | 0.5950 | 0.2548 | -1.6082 | -1.4808 | -3.6205 | -3.6247 |
1.6601 | 0.0689 | 400 | 1.6731 | -2.9617 | -3.2165 | 0.5953 | 0.2548 | -1.6083 | -1.4809 | -3.6372 | -3.6413 |
1.7377 | 0.0861 | 500 | 1.6731 | -2.9620 | -3.2165 | 0.5941 | 0.2545 | -1.6082 | -1.4810 | -3.6401 | -3.6442 |
1.5377 | 0.1034 | 600 | 1.6726 | -2.9613 | -3.2165 | 0.5962 | 0.2552 | -1.6083 | -1.4807 | -3.6397 | -3.6438 |
1.8023 | 0.1206 | 700 | 1.6730 | -2.9612 | -3.2160 | 0.5955 | 0.2548 | -1.6080 | -1.4806 | -3.6470 | -3.6510 |
1.6595 | 0.1378 | 800 | 1.6727 | -2.9617 | -3.2169 | 0.5957 | 0.2552 | -1.6085 | -1.4809 | -3.6409 | -3.6450 |
1.8292 | 0.1551 | 900 | 1.6727 | -2.9616 | -3.2167 | 0.5953 | 0.2552 | -1.6084 | -1.4808 | -3.6317 | -3.6358 |
1.8326 | 0.1723 | 1000 | 1.6722 | -2.9608 | -3.2165 | 0.5943 | 0.2556 | -1.6082 | -1.4804 | -3.6329 | -3.6370 |
1.6767 | 0.1895 | 1100 | 1.6724 | -2.9613 | -3.2168 | 0.5948 | 0.2555 | -1.6084 | -1.4806 | -3.6335 | -3.6376 |
1.711 | 0.2068 | 1200 | 1.6725 | -2.9615 | -3.2169 | 0.5950 | 0.2554 | -1.6084 | -1.4807 | -3.6391 | -3.6431 |
1.6366 | 0.2240 | 1300 | 1.6723 | -2.9612 | -3.2167 | 0.5955 | 0.2555 | -1.6083 | -1.4806 | -3.6354 | -3.6394 |
1.7495 | 0.2412 | 1400 | 1.6719 | -2.9613 | -3.2174 | 0.5946 | 0.2561 | -1.6087 | -1.4806 | -3.6341 | -3.6381 |
1.7423 | 0.2584 | 1500 | 1.6714 | -2.9610 | -3.2178 | 0.5950 | 0.2569 | -1.6089 | -1.4805 | -3.6286 | -3.6326 |
1.6612 | 0.2757 | 1600 | 1.6713 | -2.9614 | -3.2185 | 0.5943 | 0.2571 | -1.6093 | -1.4807 | -3.6393 | -3.6433 |
1.6808 | 0.2929 | 1700 | 1.6704 | -2.9613 | -3.2194 | 0.5960 | 0.2581 | -1.6097 | -1.4806 | -3.6301 | -3.6342 |
1.6208 | 0.3101 | 1800 | 1.6702 | -2.9615 | -3.2196 | 0.5946 | 0.2581 | -1.6098 | -1.4808 | -3.6222 | -3.6263 |
1.82 | 0.3274 | 1900 | 1.6692 | -2.9619 | -3.2219 | 0.5946 | 0.2600 | -1.6110 | -1.4810 | -3.6237 | -3.6277 |
1.6569 | 0.3446 | 2000 | 1.6686 | -2.9633 | -3.2240 | 0.5941 | 0.2607 | -1.6120 | -1.4816 | -3.6285 | -3.6325 |
1.8076 | 0.3618 | 2100 | 1.6682 | -2.9644 | -3.2256 | 0.5946 | 0.2612 | -1.6128 | -1.4822 | -3.6268 | -3.6308 |
1.6012 | 0.3790 | 2200 | 1.6676 | -2.9655 | -3.2275 | 0.5941 | 0.2620 | -1.6137 | -1.4827 | -3.6245 | -3.6285 |
1.6718 | 0.3963 | 2300 | 1.6663 | -2.9674 | -3.2314 | 0.5964 | 0.2640 | -1.6157 | -1.4837 | -3.6148 | -3.6189 |
1.5252 | 0.4135 | 2400 | 1.6658 | -2.9706 | -3.2353 | 0.6018 | 0.2647 | -1.6176 | -1.4853 | -3.6208 | -3.6248 |
1.7441 | 0.4307 | 2500 | 1.6648 | -2.9732 | -3.2391 | 0.6022 | 0.2659 | -1.6195 | -1.4866 | -3.6082 | -3.6122 |
1.7247 | 0.4480 | 2600 | 1.6640 | -2.9758 | -3.2426 | 0.6043 | 0.2669 | -1.6213 | -1.4879 | -3.6028 | -3.6068 |
1.5961 | 0.4652 | 2700 | 1.6629 | -2.9795 | -3.2484 | 0.6055 | 0.2689 | -1.6242 | -1.4898 | -3.6017 | -3.6057 |
1.8025 | 0.4824 | 2800 | 1.6617 | -2.9837 | -3.2540 | 0.6059 | 0.2703 | -1.6270 | -1.4918 | -3.6035 | -3.6075 |
1.8171 | 0.4997 | 2900 | 1.6608 | -2.9893 | -3.2608 | 0.6057 | 0.2715 | -1.6304 | -1.4947 | -3.6063 | -3.6102 |
1.7331 | 0.5169 | 3000 | 1.6599 | -2.9948 | -3.2675 | 0.6059 | 0.2727 | -1.6337 | -1.4974 | -3.6027 | -3.6066 |
1.6335 | 0.5341 | 3100 | 1.6588 | -2.9974 | -3.2719 | 0.6073 | 0.2745 | -1.6359 | -1.4987 | -3.6030 | -3.6069 |
1.8053 | 0.5513 | 3200 | 1.6578 | -3.0040 | -3.2800 | 0.6132 | 0.2760 | -1.6400 | -1.5020 | -3.5859 | -3.5898 |
1.7128 | 0.5686 | 3300 | 1.6569 | -3.0123 | -3.2894 | 0.6143 | 0.2771 | -1.6447 | -1.5061 | -3.5933 | -3.5971 |
1.5567 | 0.5858 | 3400 | 1.6554 | -3.0215 | -3.3012 | 0.6141 | 0.2797 | -1.6506 | -1.5108 | -3.5884 | -3.5923 |
1.6557 | 0.6030 | 3500 | 1.6545 | -3.0307 | -3.3121 | 0.6150 | 0.2814 | -1.6561 | -1.5153 | -3.5780 | -3.5820 |
1.7443 | 0.6203 | 3600 | 1.6533 | -3.0435 | -3.3271 | 0.6143 | 0.2835 | -1.6635 | -1.5218 | -3.5697 | -3.5737 |
1.4954 | 0.6375 | 3700 | 1.6515 | -3.0535 | -3.3399 | 0.6132 | 0.2863 | -1.6699 | -1.5268 | -3.5817 | -3.5856 |
1.7495 | 0.6547 | 3800 | 1.6500 | -3.0685 | -3.3571 | 0.6150 | 0.2886 | -1.6785 | -1.5342 | -3.5748 | -3.5787 |
1.5238 | 0.6720 | 3900 | 1.6493 | -3.0836 | -3.3737 | 0.6120 | 0.2901 | -1.6869 | -1.5418 | -3.5617 | -3.5656 |
1.7001 | 0.6892 | 4000 | 1.6481 | -3.1041 | -3.3965 | 0.6053 | 0.2924 | -1.6983 | -1.5521 | -3.5621 | -3.5659 |
1.5842 | 0.7064 | 4100 | 1.6466 | -3.1233 | -3.4188 | 0.6085 | 0.2954 | -1.7094 | -1.5617 | -3.5465 | -3.5504 |
1.7415 | 0.7236 | 4200 | 1.6453 | -3.1419 | -3.4399 | 0.6090 | 0.2980 | -1.7200 | -1.5709 | -3.5474 | -3.5512 |
1.6328 | 0.7409 | 4300 | 1.6435 | -3.1587 | -3.4597 | 0.6092 | 0.3010 | -1.7299 | -1.5793 | -3.5451 | -3.5489 |
1.6841 | 0.7581 | 4400 | 1.6433 | -3.1831 | -3.4855 | 0.6122 | 0.3024 | -1.7427 | -1.5915 | -3.5446 | -3.5485 |
1.7753 | 0.7753 | 4500 | 1.6420 | -3.2122 | -3.5175 | 0.6138 | 0.3053 | -1.7588 | -1.6061 | -3.5300 | -3.5339 |
1.5101 | 0.7926 | 4600 | 1.6403 | -3.2378 | -3.5467 | 0.6150 | 0.3089 | -1.7734 | -1.6189 | -3.5383 | -3.5421 |
1.5603 | 0.8098 | 4700 | 1.6389 | -3.2699 | -3.5819 | 0.6169 | 0.3120 | -1.7910 | -1.6350 | -3.5395 | -3.5432 |
1.6925 | 0.8270 | 4800 | 1.6371 | -3.3015 | -3.6169 | 0.6197 | 0.3154 | -1.8085 | -1.6508 | -3.5388 | -3.5425 |
1.55 | 0.8442 | 4900 | 1.6352 | -3.3371 | -3.6557 | 0.6204 | 0.3187 | -1.8279 | -1.6685 | -3.5296 | -3.5334 |
1.7547 | 0.8615 | 5000 | 1.6344 | -3.3516 | -3.6717 | 0.6215 | 0.3200 | -1.8358 | -1.6758 | -3.5221 | -3.5259 |
1.5639 | 0.8787 | 5100 | 1.6325 | -3.3917 | -3.7152 | 0.6215 | 0.3234 | -1.8576 | -1.6959 | -3.5201 | -3.5238 |
1.5202 | 0.8959 | 5200 | 1.6311 | -3.4276 | -3.7534 | 0.6211 | 0.3258 | -1.8767 | -1.7138 | -3.5244 | -3.5282 |
1.5903 | 0.9132 | 5300 | 1.6297 | -3.4486 | -3.7771 | 0.6215 | 0.3286 | -1.8886 | -1.7243 | -3.5069 | -3.5107 |
1.4759 | 0.9304 | 5400 | 1.6283 | -3.4696 | -3.8007 | 0.6229 | 0.3311 | -1.9004 | -1.7348 | -3.5057 | -3.5095 |
1.5141 | 0.9476 | 5500 | 1.6276 | -3.4762 | -3.8091 | 0.6222 | 0.3328 | -1.9045 | -1.7381 | -3.5203 | -3.5240 |
1.6434 | 0.9649 | 5600 | 1.6268 | -3.4817 | -3.8157 | 0.6234 | 0.3340 | -1.9079 | -1.7408 | -3.5045 | -3.5082 |
1.6866 | 0.9821 | 5700 | 1.6257 | -3.4974 | -3.8333 | 0.6241 | 0.3359 | -1.9167 | -1.7487 | -3.4916 | -3.4954 |
1.4625 | 0.9993 | 5800 | 1.6247 | -3.5213 | -3.8599 | 0.6241 | 0.3386 | -1.9300 | -1.7606 | -3.4941 | -3.4979 |
1.5559 | 1.0165 | 5900 | 1.6238 | -3.5308 | -3.8712 | 0.6243 | 0.3404 | -1.9356 | -1.7654 | -3.4910 | -3.4947 |
1.5296 | 1.0338 | 6000 | 1.6234 | -3.5406 | -3.8820 | 0.6241 | 0.3414 | -1.9410 | -1.7703 | -3.4929 | -3.4966 |
1.7383 | 1.0510 | 6100 | 1.6228 | -3.5586 | -3.9015 | 0.6208 | 0.3429 | -1.9508 | -1.7793 | -3.4874 | -3.4910 |
1.5491 | 1.0682 | 6200 | 1.6215 | -3.5797 | -3.9249 | 0.6208 | 0.3452 | -1.9624 | -1.7899 | -3.4773 | -3.4810 |
1.5498 | 1.0855 | 6300 | 1.6214 | -3.5931 | -3.9393 | 0.6204 | 0.3462 | -1.9696 | -1.7965 | -3.4747 | -3.4784 |
1.613 | 1.1027 | 6400 | 1.6210 | -3.6015 | -3.9492 | 0.6206 | 0.3478 | -1.9746 | -1.8007 | -3.4789 | -3.4826 |
1.7929 | 1.1199 | 6500 | 1.6200 | -3.6169 | -3.9669 | 0.6220 | 0.3500 | -1.9835 | -1.8085 | -3.4716 | -3.4753 |
1.7372 | 1.1371 | 6600 | 1.6199 | -3.6260 | -3.9776 | 0.6215 | 0.3516 | -1.9888 | -1.8130 | -3.4685 | -3.4721 |
1.5748 | 1.1544 | 6700 | 1.6198 | -3.6291 | -3.9815 | 0.6227 | 0.3524 | -1.9908 | -1.8145 | -3.4609 | -3.4646 |
1.5268 | 1.1716 | 6800 | 1.6184 | -3.6529 | -4.0082 | 0.6217 | 0.3553 | -2.0041 | -1.8264 | -3.4472 | -3.4509 |
1.552 | 1.1888 | 6900 | 1.6182 | -3.6682 | -4.0248 | 0.6215 | 0.3565 | -2.0124 | -1.8341 | -3.4597 | -3.4633 |
1.5713 | 1.2061 | 7000 | 1.6170 | -3.6855 | -4.0446 | 0.6231 | 0.3591 | -2.0223 | -1.8427 | -3.4683 | -3.4718 |
1.6189 | 1.2233 | 7100 | 1.6174 | -3.6787 | -4.0380 | 0.6215 | 0.3592 | -2.0190 | -1.8394 | -3.4553 | -3.4589 |
1.488 | 1.2405 | 7200 | 1.6166 | -3.7005 | -4.0616 | 0.6217 | 0.3612 | -2.0308 | -1.8502 | -3.4373 | -3.4410 |
1.5506 | 1.2578 | 7300 | 1.6159 | -3.7223 | -4.0849 | 0.6224 | 0.3626 | -2.0425 | -1.8612 | -3.4467 | -3.4503 |
1.5274 | 1.2750 | 7400 | 1.6148 | -3.7367 | -4.1018 | 0.6241 | 0.3652 | -2.0509 | -1.8683 | -3.4483 | -3.4519 |
1.547 | 1.2922 | 7500 | 1.6138 | -3.7467 | -4.1131 | 0.6217 | 0.3664 | -2.0565 | -1.8734 | -3.4345 | -3.4381 |
1.4958 | 1.3094 | 7600 | 1.6142 | -3.7568 | -4.1234 | 0.6231 | 0.3666 | -2.0617 | -1.8784 | -3.4377 | -3.4412 |
1.4875 | 1.3267 | 7700 | 1.6137 | -3.7720 | -4.1409 | 0.6224 | 0.3689 | -2.0704 | -1.8860 | -3.4242 | -3.4279 |
1.5489 | 1.3439 | 7800 | 1.6132 | -3.7819 | -4.1526 | 0.6229 | 0.3707 | -2.0763 | -1.8909 | -3.4258 | -3.4294 |
1.6241 | 1.3611 | 7900 | 1.6129 | -3.7899 | -4.1616 | 0.6238 | 0.3717 | -2.0808 | -1.8949 | -3.4339 | -3.4374 |
1.6697 | 1.3784 | 8000 | 1.6118 | -3.8009 | -4.1750 | 0.6236 | 0.3741 | -2.0875 | -1.9004 | -3.4167 | -3.4203 |
1.5586 | 1.3956 | 8100 | 1.6123 | -3.8096 | -4.1836 | 0.6241 | 0.3740 | -2.0918 | -1.9048 | -3.4215 | -3.4250 |
1.3943 | 1.4128 | 8200 | 1.6110 | -3.8221 | -4.1990 | 0.6245 | 0.3769 | -2.0995 | -1.9110 | -3.4127 | -3.4163 |
1.6019 | 1.4300 | 8300 | 1.6098 | -3.8372 | -4.2158 | 0.6245 | 0.3786 | -2.1079 | -1.9186 | -3.4157 | -3.4193 |
1.475 | 1.4473 | 8400 | 1.6101 | -3.8498 | -4.2288 | 0.625 | 0.3789 | -2.1144 | -1.9249 | -3.4113 | -3.4149 |
1.5141 | 1.4645 | 8500 | 1.6099 | -3.8489 | -4.2288 | 0.6248 | 0.3799 | -2.1144 | -1.9245 | -3.4117 | -3.4152 |
1.5064 | 1.4817 | 8600 | 1.6103 | -3.8593 | -4.2395 | 0.6238 | 0.3802 | -2.1198 | -1.9297 | -3.4115 | -3.4151 |
1.5121 | 1.4990 | 8700 | 1.6100 | -3.8718 | -4.2527 | 0.6241 | 0.3810 | -2.1264 | -1.9359 | -3.4052 | -3.4087 |
1.4344 | 1.5162 | 8800 | 1.6097 | -3.8842 | -4.2665 | 0.625 | 0.3824 | -2.1333 | -1.9421 | -3.4016 | -3.4051 |
1.4826 | 1.5334 | 8900 | 1.6085 | -3.9086 | -4.2937 | 0.6259 | 0.3851 | -2.1468 | -1.9543 | -3.4018 | -3.4053 |
1.5369 | 1.5507 | 9000 | 1.6084 | -3.9188 | -4.3046 | 0.6257 | 0.3857 | -2.1523 | -1.9594 | -3.3997 | -3.4032 |
1.6204 | 1.5679 | 9100 | 1.6072 | -3.9245 | -4.3122 | 0.6255 | 0.3877 | -2.1561 | -1.9623 | -3.3930 | -3.3965 |
1.5032 | 1.5851 | 9200 | 1.6078 | -3.9320 | -4.3202 | 0.6273 | 0.3882 | -2.1601 | -1.9660 | -3.3925 | -3.3961 |
1.5816 | 1.6023 | 9300 | 1.6080 | -3.9339 | -4.3227 | 0.6273 | 0.3887 | -2.1613 | -1.9670 | -3.3853 | -3.3889 |
1.5464 | 1.6196 | 9400 | 1.6076 | -3.9388 | -4.3286 | 0.6266 | 0.3898 | -2.1643 | -1.9694 | -3.3809 | -3.3845 |
1.4955 | 1.6368 | 9500 | 1.6068 | -3.9471 | -4.3382 | 0.6271 | 0.3912 | -2.1691 | -1.9735 | -3.3932 | -3.3967 |
1.5395 | 1.6540 | 9600 | 1.6069 | -3.9479 | -4.3397 | 0.6259 | 0.3918 | -2.1699 | -1.9739 | -3.3839 | -3.3875 |
1.4387 | 1.6713 | 9700 | 1.6068 | -3.9502 | -4.3421 | 0.6276 | 0.3919 | -2.1711 | -1.9751 | -3.3783 | -3.3818 |
1.3438 | 1.6885 | 9800 | 1.6067 | -3.9572 | -4.3502 | 0.6276 | 0.3929 | -2.1751 | -1.9786 | -3.3939 | -3.3974 |
1.3561 | 1.7057 | 9900 | 1.6061 | -3.9663 | -4.3606 | 0.6285 | 0.3943 | -2.1803 | -1.9832 | -3.3849 | -3.3884 |
1.3892 | 1.7229 | 10000 | 1.6060 | -3.9747 | -4.3696 | 0.6257 | 0.3949 | -2.1848 | -1.9874 | -3.3726 | -3.3761 |
1.5131 | 1.7402 | 10100 | 1.6058 | -3.9802 | -4.3758 | 0.6266 | 0.3955 | -2.1879 | -1.9901 | -3.3776 | -3.3811 |
1.5061 | 1.7574 | 10200 | 1.6050 | -3.9996 | -4.3969 | 0.6266 | 0.3974 | -2.1985 | -1.9998 | -3.3678 | -3.3713 |
1.6132 | 1.7746 | 10300 | 1.6050 | -4.0049 | -4.4028 | 0.6231 | 0.3979 | -2.2014 | -2.0024 | -3.3743 | -3.3778 |
1.3357 | 1.7919 | 10400 | 1.6050 | -4.0040 | -4.4026 | 0.6213 | 0.3986 | -2.2013 | -2.0020 | -3.3710 | -3.3744 |
1.4868 | 1.8091 | 10500 | 1.6045 | -4.0107 | -4.4100 | 0.6187 | 0.3993 | -2.2050 | -2.0054 | -3.3805 | -3.3839 |
1.5879 | 1.8263 | 10600 | 1.6052 | -4.0153 | -4.4144 | 0.6197 | 0.3991 | -2.2072 | -2.0077 | -3.3635 | -3.3670 |
1.4603 | 1.8436 | 10700 | 1.6047 | -4.0231 | -4.4232 | 0.6220 | 0.4001 | -2.2116 | -2.0115 | -3.3620 | -3.3655 |
1.3798 | 1.8608 | 10800 | 1.6042 | -4.0306 | -4.4320 | 0.6227 | 0.4013 | -2.2160 | -2.0153 | -3.3691 | -3.3725 |
1.4895 | 1.8780 | 10900 | 1.6039 | -4.0340 | -4.4358 | 0.6208 | 0.4018 | -2.2179 | -2.0170 | -3.3651 | -3.3685 |
1.6103 | 1.8952 | 11000 | 1.6041 | -4.0366 | -4.4389 | 0.6220 | 0.4022 | -2.2194 | -2.0183 | -3.3628 | -3.3663 |
1.5105 | 1.9125 | 11100 | 1.6033 | -4.0506 | -4.4549 | 0.6220 | 0.4044 | -2.2275 | -2.0253 | -3.3548 | -3.3583 |
1.3955 | 1.9297 | 11200 | 1.6034 | -4.0569 | -4.4612 | 0.6213 | 0.4043 | -2.2306 | -2.0284 | -3.3499 | -3.3534 |
1.6675 | 1.9469 | 11300 | 1.6030 | -4.0634 | -4.4689 | 0.6248 | 0.4055 | -2.2345 | -2.0317 | -3.3599 | -3.3633 |
1.467 | 1.9642 | 11400 | 1.6030 | -4.0655 | -4.4710 | 0.6220 | 0.4055 | -2.2355 | -2.0327 | -3.3408 | -3.3444 |
1.6141 | 1.9814 | 11500 | 1.6028 | -4.0747 | -4.4814 | 0.6245 | 0.4067 | -2.2407 | -2.0373 | -3.3533 | -3.3568 |
1.4188 | 1.9986 | 11600 | 1.6029 | -4.0793 | -4.4862 | 0.6241 | 0.4070 | -2.2431 | -2.0396 | -3.3551 | -3.3585 |
1.3363 | 2.0159 | 11700 | 1.6028 | -4.0903 | -4.4982 | 0.6204 | 0.4079 | -2.2491 | -2.0452 | -3.3525 | -3.3559 |
1.5703 | 2.0331 | 11800 | 1.6029 | -4.0960 | -4.5040 | 0.625 | 0.4080 | -2.2520 | -2.0480 | -3.3542 | -3.3576 |
1.4007 | 2.0503 | 11900 | 1.6025 | -4.1070 | -4.5164 | 0.6264 | 0.4094 | -2.2582 | -2.0535 | -3.3537 | -3.3571 |
1.3923 | 2.0675 | 12000 | 1.6020 | -4.1075 | -4.5177 | 0.6220 | 0.4102 | -2.2588 | -2.0538 | -3.3445 | -3.3480 |
1.606 | 2.0848 | 12100 | 1.6018 | -4.1121 | -4.5235 | 0.6236 | 0.4113 | -2.2617 | -2.0561 | -3.3442 | -3.3476 |
1.5084 | 2.1020 | 12200 | 1.6017 | -4.1195 | -4.5307 | 0.6257 | 0.4112 | -2.2654 | -2.0597 | -3.3465 | -3.3499 |
1.4099 | 2.1192 | 12300 | 1.6014 | -4.1198 | -4.5311 | 0.6229 | 0.4112 | -2.2655 | -2.0599 | -3.3430 | -3.3464 |
1.5056 | 2.1365 | 12400 | 1.6009 | -4.1230 | -4.5360 | 0.6213 | 0.4129 | -2.2680 | -2.0615 | -3.3393 | -3.3427 |
1.3618 | 2.1537 | 12500 | 1.6016 | -4.1280 | -4.5395 | 0.6227 | 0.4115 | -2.2698 | -2.0640 | -3.3424 | -3.3459 |
1.3944 | 2.1709 | 12600 | 1.6015 | -4.1305 | -4.5426 | 0.6248 | 0.4121 | -2.2713 | -2.0652 | -3.3480 | -3.3514 |
1.5202 | 2.1881 | 12700 | 1.6014 | -4.1351 | -4.5482 | 0.6213 | 0.4131 | -2.2741 | -2.0675 | -3.3383 | -3.3418 |
1.5605 | 2.2054 | 12800 | 1.6009 | -4.1366 | -4.5507 | 0.6234 | 0.4141 | -2.2754 | -2.0683 | -3.3370 | -3.3404 |
1.3645 | 2.2226 | 12900 | 1.6009 | -4.1383 | -4.5525 | 0.6224 | 0.4142 | -2.2762 | -2.0691 | -3.3402 | -3.3436 |
1.5051 | 2.2398 | 13000 | 1.6006 | -4.1434 | -4.5586 | 0.6229 | 0.4151 | -2.2793 | -2.0717 | -3.3364 | -3.3398 |
1.4171 | 2.2571 | 13100 | 1.6011 | -4.1443 | -4.5592 | 0.6224 | 0.4149 | -2.2796 | -2.0721 | -3.3394 | -3.3428 |
1.4166 | 2.2743 | 13200 | 1.6005 | -4.1497 | -4.5654 | 0.6227 | 0.4158 | -2.2827 | -2.0748 | -3.3398 | -3.3432 |
1.5389 | 2.2915 | 13300 | 1.6007 | -4.1508 | -4.5665 | 0.6234 | 0.4157 | -2.2832 | -2.0754 | -3.3449 | -3.3483 |
1.4618 | 2.3088 | 13400 | 1.6007 | -4.1553 | -4.5710 | 0.6227 | 0.4157 | -2.2855 | -2.0776 | -3.3437 | -3.3471 |
1.3821 | 2.3260 | 13500 | 1.6001 | -4.1574 | -4.5743 | 0.6229 | 0.4170 | -2.2872 | -2.0787 | -3.3213 | -3.3248 |
1.4958 | 2.3432 | 13600 | 1.5997 | -4.1605 | -4.5782 | 0.6241 | 0.4177 | -2.2891 | -2.0802 | -3.3339 | -3.3374 |
1.5225 | 2.3604 | 13700 | 1.6000 | -4.1639 | -4.5813 | 0.6227 | 0.4174 | -2.2906 | -2.0820 | -3.3315 | -3.3349 |
1.5279 | 2.3777 | 13800 | 1.5999 | -4.1666 | -4.5843 | 0.6234 | 0.4177 | -2.2921 | -2.0833 | -3.3375 | -3.3409 |
1.5492 | 2.3949 | 13900 | 1.5997 | -4.1676 | -4.5857 | 0.6227 | 0.4182 | -2.2929 | -2.0838 | -3.3367 | -3.3401 |
1.4219 | 2.4121 | 14000 | 1.5998 | -4.1724 | -4.5908 | 0.6231 | 0.4184 | -2.2954 | -2.0862 | -3.3231 | -3.3265 |
1.4625 | 2.4294 | 14100 | 1.5994 | -4.1764 | -4.5952 | 0.6238 | 0.4188 | -2.2976 | -2.0882 | -3.3154 | -3.3189 |
1.3039 | 2.4466 | 14200 | 1.5993 | -4.1746 | -4.5941 | 0.6231 | 0.4195 | -2.2971 | -2.0873 | -3.3285 | -3.3319 |
1.4333 | 2.4638 | 14300 | 1.5993 | -4.1779 | -4.5973 | 0.6238 | 0.4194 | -2.2987 | -2.0889 | -3.3319 | -3.3353 |
1.4677 | 2.4810 | 14400 | 1.5992 | -4.1805 | -4.6002 | 0.6229 | 0.4197 | -2.3001 | -2.0902 | -3.3219 | -3.3253 |
1.3125 | 2.4983 | 14500 | 1.5994 | -4.1824 | -4.6024 | 0.6229 | 0.4200 | -2.3012 | -2.0912 | -3.3350 | -3.3384 |
1.4611 | 2.5155 | 14600 | 1.5989 | -4.1839 | -4.6043 | 0.6248 | 0.4204 | -2.3021 | -2.0920 | -3.3344 | -3.3378 |
1.4287 | 2.5327 | 14700 | 1.5989 | -4.1868 | -4.6073 | 0.6231 | 0.4205 | -2.3037 | -2.0934 | -3.3421 | -3.3455 |
1.5098 | 2.5500 | 14800 | 1.5989 | -4.1855 | -4.6061 | 0.6234 | 0.4206 | -2.3031 | -2.0928 | -3.3370 | -3.3403 |
1.3432 | 2.5672 | 14900 | 1.5995 | -4.1878 | -4.6080 | 0.6231 | 0.4202 | -2.3040 | -2.0939 | -3.3237 | -3.3271 |
1.6495 | 2.5844 | 15000 | 1.5992 | -4.1893 | -4.6094 | 0.6231 | 0.4201 | -2.3047 | -2.0947 | -3.3315 | -3.3349 |
1.4971 | 2.6017 | 15100 | 1.5992 | -4.1890 | -4.6095 | 0.6234 | 0.4205 | -2.3048 | -2.0945 | -3.3235 | -3.3270 |
1.3488 | 2.6189 | 15200 | 1.5990 | -4.1909 | -4.6118 | 0.6231 | 0.4210 | -2.3059 | -2.0954 | -3.3239 | -3.3273 |
1.3814 | 2.6361 | 15300 | 1.5994 | -4.1911 | -4.6115 | 0.6229 | 0.4204 | -2.3058 | -2.0955 | -3.3206 | -3.3240 |
1.4437 | 2.6533 | 15400 | 1.5993 | -4.1905 | -4.6109 | 0.6222 | 0.4204 | -2.3054 | -2.0952 | -3.3217 | -3.3252 |
1.5573 | 2.6706 | 15500 | 1.5995 | -4.1915 | -4.6116 | 0.6222 | 0.4201 | -2.3058 | -2.0958 | -3.3258 | -3.3293 |
1.4515 | 2.6878 | 15600 | 1.5986 | -4.1902 | -4.6120 | 0.6229 | 0.4219 | -2.3060 | -2.0951 | -3.3170 | -3.3205 |
1.3256 | 2.7050 | 15700 | 1.5993 | -4.1914 | -4.6118 | 0.6227 | 0.4204 | -2.3059 | -2.0957 | -3.3388 | -3.3421 |
1.4458 | 2.7223 | 15800 | 1.6001 | -4.1918 | -4.6113 | 0.6220 | 0.4195 | -2.3057 | -2.0959 | -3.3286 | -3.3321 |
1.3734 | 2.7395 | 15900 | 1.5991 | -4.1906 | -4.6111 | 0.6227 | 0.4206 | -2.3056 | -2.0953 | -3.3224 | -3.3258 |
1.4477 | 2.7567 | 16000 | 1.5998 | -4.1924 | -4.6122 | 0.6224 | 0.4198 | -2.3061 | -2.0962 | -3.3408 | -3.3441 |
1.401 | 2.7739 | 16100 | 1.5992 | -4.1917 | -4.6125 | 0.6234 | 0.4208 | -2.3063 | -2.0959 | -3.3096 | -3.3131 |
1.422 | 2.7912 | 16200 | 1.5998 | -4.1927 | -4.6123 | 0.6236 | 0.4196 | -2.3062 | -2.0964 | -3.3248 | -3.3282 |
1.4691 | 2.8084 | 16300 | 1.5994 | -4.1918 | -4.6125 | 0.6236 | 0.4207 | -2.3062 | -2.0959 | -3.3187 | -3.3222 |
1.4821 | 2.8256 | 16400 | 1.5993 | -4.1923 | -4.6129 | 0.6241 | 0.4206 | -2.3064 | -2.0962 | -3.3167 | -3.3202 |
1.539 | 2.8429 | 16500 | 1.6001 | -4.1929 | -4.6126 | 0.6234 | 0.4197 | -2.3063 | -2.0964 | -3.3192 | -3.3227 |
1.7983 | 2.8601 | 16600 | 1.5994 | -4.1926 | -4.6132 | 0.6224 | 0.4206 | -2.3066 | -2.0963 | -3.3258 | -3.3293 |
1.4889 | 2.8773 | 16700 | 1.5994 | -4.1917 | -4.6125 | 0.6229 | 0.4208 | -2.3062 | -2.0958 | -3.3144 | -3.3179 |
1.5191 | 2.8946 | 16800 | 1.5994 | -4.1924 | -4.6128 | 0.6222 | 0.4204 | -2.3064 | -2.0962 | -3.3194 | -3.3229 |
1.6401 | 2.9118 | 16900 | 1.5999 | -4.1929 | -4.6129 | 0.6224 | 0.4199 | -2.3064 | -2.0965 | -3.3256 | -3.3291 |
1.5593 | 2.9290 | 17000 | 1.5989 | -4.1926 | -4.6138 | 0.6227 | 0.4212 | -2.3069 | -2.0963 | -3.3279 | -3.3313 |
1.5395 | 2.9462 | 17100 | 1.5989 | -4.1923 | -4.6135 | 0.6234 | 0.4212 | -2.3068 | -2.0962 | -3.3291 | -3.3325 |
1.7984 | 2.9635 | 17200 | 1.5992 | -4.1921 | -4.6128 | 0.6227 | 0.4207 | -2.3064 | -2.0960 | -3.3195 | -3.3230 |
1.6222 | 2.9807 | 17300 | 1.5992 | -4.1931 | -4.6141 | 0.6238 | 0.4210 | -2.3070 | -2.0965 | -3.3339 | -3.3372 |
1.4575 | 2.9979 | 17400 | 1.5991 | -4.1926 | -4.6137 | 0.6231 | 0.4211 | -2.3069 | -2.0963 | -3.3338 | -3.3372 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.