--- library_name: transformers license: other base_model: trl-lib/qwen1.5-0.5b-sft tags: - alignment-handbook - trl - simpo - generated_from_trainer - trl - simpo - generated_from_trainer datasets: - yakazimir/ultrafeedback_binarized model-index: - name: qwen_uncCPO_entropy results: [] --- # qwen_uncCPO_entropy This model is a fine-tuned version of [trl-lib/qwen1.5-0.5b-sft](https://huggingface.co/trl-lib/qwen1.5-0.5b-sft) on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set: - Loss: 0.0000 - Rewards/chosen: -46.3149 - Rewards/rejected: -47.3422 - Rewards/accuracies: 0.5616 - Rewards/margins: 1.0272 - Logps/rejected: -47.3422 - Logps/chosen: -46.3149 - Logits/rejected: 7.3215 - Logits/chosen: 7.6457 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-06 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 16 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.0 | 0.2141 | 400 | 0.0001 | -31.9619 | -33.7423 | 0.5660 | 1.7804 | -33.7423 | -31.9619 | 4.6072 | 4.6195 | | 0.0 | 0.4282 | 800 | 0.0000 | -39.5193 | -40.9236 | 0.5593 | 1.4042 | -40.9236 | -39.5193 | 6.2657 | 6.4289 | | 0.0008 | 0.6422 | 1200 | 0.0000 | -39.2251 | -40.6025 | 0.5542 | 1.3774 | -40.6025 | -39.2251 | 6.1312 | 6.2908 | | 0.0 | 0.8563 | 1600 | 0.0000 | -41.1464 | -42.5420 | 0.5638 | 1.3956 | -42.5420 | -41.1464 | 6.3830 | 6.5549 | | 0.0 | 1.0704 | 2000 | 0.0000 | -43.4369 | -44.6769 | 0.5734 | 1.2400 | -44.6769 | -43.4369 | 6.8661 | 7.0992 | | 0.0 | 1.2845 | 2400 | 0.0000 | -43.9619 | -45.1746 | 0.5697 | 1.2127 | -45.1746 | -43.9619 | 6.9058 | 7.1560 | | 0.0 | 1.4986 | 2800 | 0.0000 | -44.1897 | -45.3701 | 0.5645 | 1.1803 | -45.3701 | -44.1897 | 6.8977 | 7.1567 | | 0.0 | 1.7127 | 3200 | 0.0000 | -44.9141 | -46.0263 | 0.5660 | 1.1122 | -46.0263 | -44.9141 | 7.0833 | 7.3687 | | 0.0 | 1.9267 | 3600 | 0.0000 | -45.5997 | -46.6466 | 0.5645 | 1.0470 | -46.6466 | -45.5997 | 7.1427 | 7.4593 | | 0.0 | 2.1408 | 4000 | 0.0000 | -45.8198 | -46.8818 | 0.5601 | 1.0620 | -46.8818 | -45.8198 | 7.2832 | 7.5923 | | 0.0 | 2.3549 | 4400 | 0.0000 | -45.8900 | -46.9389 | 0.5653 | 1.0489 | -46.9389 | -45.8900 | 7.2655 | 7.5788 | | 0.0 | 2.5690 | 4800 | 0.0000 | -45.9866 | -47.0244 | 0.5623 | 1.0378 | -47.0244 | -45.9866 | 7.2594 | 7.5758 | | 0.0 | 2.7831 | 5200 | 0.0000 | -45.8574 | -46.9081 | 0.5623 | 1.0507 | -46.9081 | -45.8574 | 7.2536 | 7.5634 | | 0.0 | 2.9972 | 5600 | 0.0000 | -46.3149 | -47.3422 | 0.5616 | 1.0272 | -47.3422 | -46.3149 | 7.3215 | 7.6457 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.2.2+cu121 - Datasets 2.18.0 - Tokenizers 0.19.1