--- license: llama3 base_model: tsavage68/Summary_L3_1000steps_1e7rate_SFT2 tags: - trl - dpo - generated_from_trainer model-index: - name: Summary_L3_1000steps_1e8rate_03beta_CSFTDPO results: [] --- # Summary_L3_1000steps_1e8rate_03beta_CSFTDPO This model is a fine-tuned version of [tsavage68/Summary_L3_1000steps_1e7rate_SFT2](https://huggingface.co/tsavage68/Summary_L3_1000steps_1e7rate_SFT2) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.6919 - Rewards/chosen: -0.0023 - Rewards/rejected: -0.0059 - Rewards/accuracies: 0.0650 - Rewards/margins: 0.0036 - Logps/rejected: -15.2835 - Logps/chosen: -9.3904 - Logits/rejected: -1.0962 - Logits/chosen: -1.0977 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-08 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6866 | 0.2004 | 50 | 0.6914 | -0.0024 | -0.0068 | 0.0750 | 0.0044 | -15.2865 | -9.3909 | -1.0958 | -1.0972 | | 0.6966 | 0.4008 | 100 | 0.6896 | 0.0031 | -0.0051 | 0.0850 | 0.0082 | -15.2806 | -9.3724 | -1.0965 | -1.0979 | | 0.6924 | 0.6012 | 150 | 0.6911 | -0.0000 | -0.0053 | 0.0850 | 0.0053 | -15.2813 | -9.3828 | -1.0957 | -1.0972 | | 0.6908 | 0.8016 | 200 | 0.6901 | 0.0009 | -0.0058 | 0.0900 | 0.0066 | -15.2830 | -9.3799 | -1.0957 | -1.0971 | | 0.6922 | 1.0020 | 250 | 0.6889 | 0.0008 | -0.0086 | 0.0950 | 0.0094 | -15.2923 | -9.3800 | -1.0959 | -1.0974 | | 0.6944 | 1.2024 | 300 | 0.6906 | -0.0011 | -0.0069 | 0.0900 | 0.0058 | -15.2869 | -9.3865 | -1.0957 | -1.0971 | | 0.6919 | 1.4028 | 350 | 0.6878 | 0.0019 | -0.0099 | 0.0900 | 0.0117 | -15.2966 | -9.3766 | -1.0961 | -1.0975 | | 0.6937 | 1.6032 | 400 | 0.6879 | 0.0049 | -0.0067 | 0.0900 | 0.0116 | -15.2860 | -9.3664 | -1.0963 | -1.0977 | | 0.6927 | 1.8036 | 450 | 0.6903 | 0.0001 | -0.0065 | 0.0850 | 0.0066 | -15.2854 | -9.3824 | -1.0962 | -1.0977 | | 0.6917 | 2.0040 | 500 | 0.6922 | -0.0002 | -0.0030 | 0.0700 | 0.0028 | -15.2739 | -9.3835 | -1.0959 | -1.0973 | | 0.6983 | 2.2044 | 550 | 0.6911 | -0.0014 | -0.0068 | 0.0750 | 0.0053 | -15.2863 | -9.3875 | -1.0960 | -1.0974 | | 0.6901 | 2.4048 | 600 | 0.6902 | 0.0002 | -0.0065 | 0.0900 | 0.0067 | -15.2854 | -9.3820 | -1.0967 | -1.0982 | | 0.6859 | 2.6052 | 650 | 0.6890 | 0.0027 | -0.0066 | 0.0950 | 0.0093 | -15.2858 | -9.3738 | -1.0964 | -1.0978 | | 0.694 | 2.8056 | 700 | 0.6910 | 0.0002 | -0.0048 | 0.0850 | 0.0050 | -15.2799 | -9.3823 | -1.0963 | -1.0978 | | 0.6909 | 3.0060 | 750 | 0.6936 | -0.0027 | -0.0025 | 0.0600 | -0.0002 | -15.2720 | -9.3918 | -1.0964 | -1.0978 | | 0.6909 | 3.2064 | 800 | 0.6912 | -0.0017 | -0.0065 | 0.0650 | 0.0049 | -15.2855 | -9.3883 | -1.0963 | -1.0977 | | 0.6929 | 3.4068 | 850 | 0.6914 | -0.0008 | -0.0054 | 0.0800 | 0.0047 | -15.2819 | -9.3853 | -1.0962 | -1.0976 | | 0.6938 | 3.6072 | 900 | 0.6919 | -0.0023 | -0.0059 | 0.0650 | 0.0036 | -15.2835 | -9.3904 | -1.0962 | -1.0977 | | 0.69 | 3.8076 | 950 | 0.6919 | -0.0023 | -0.0059 | 0.0650 | 0.0036 | -15.2835 | -9.3904 | -1.0962 | -1.0977 | | 0.6968 | 4.0080 | 1000 | 0.6919 | -0.0023 | -0.0059 | 0.0650 | 0.0036 | -15.2835 | -9.3904 | -1.0962 | -1.0977 | ### Framework versions - Transformers 4.41.2 - Pytorch 2.0.0+cu117 - Datasets 2.20.0 - Tokenizers 0.19.1