--- library_name: transformers license: apache-2.0 base_model: tsavage68/Na_M2_1000steps_1e7_SFT tags: - trl - dpo - generated_from_trainer model-index: - name: Na_M2_1000steps_1e7rate_05beta_cSFTDPO results: [] --- # Na_M2_1000steps_1e7rate_05beta_cSFTDPO This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.0000 - Rewards/chosen: 3.4353 - Rewards/rejected: -12.0460 - Rewards/accuracies: 1.0 - Rewards/margins: 15.4813 - Logps/rejected: -104.0153 - Logps/chosen: -41.2618 - Logits/rejected: -2.5171 - Logits/chosen: -2.5312 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-07 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.0 | 0.2667 | 50 | 0.0000 | 2.4333 | -8.7946 | 1.0 | 11.2279 | -97.5125 | -43.2658 | -2.5259 | -2.5391 | | 0.0 | 0.5333 | 100 | 0.0000 | 2.7977 | -9.9936 | 1.0 | 12.7913 | -99.9105 | -42.5369 | -2.5223 | -2.5359 | | 0.0 | 0.8 | 150 | 0.0000 | 2.9419 | -10.6551 | 1.0 | 13.5970 | -101.2335 | -42.2486 | -2.5210 | -2.5347 | | 0.0 | 1.0667 | 200 | 0.0000 | 3.0397 | -10.9989 | 1.0 | 14.0386 | -101.9212 | -42.0530 | -2.5209 | -2.5347 | | 0.0 | 1.3333 | 250 | 0.0000 | 3.1479 | -11.2365 | 1.0 | 14.3844 | -102.3963 | -41.8365 | -2.5209 | -2.5348 | | 0.0 | 1.6 | 300 | 0.0000 | 3.1788 | -11.4604 | 1.0 | 14.6393 | -102.8442 | -41.7747 | -2.5197 | -2.5337 | | 0.0 | 1.8667 | 350 | 0.0000 | 3.2803 | -11.6306 | 1.0 | 14.9109 | -103.1846 | -41.5718 | -2.5199 | -2.5339 | | 0.0 | 2.1333 | 400 | 0.0000 | 3.3009 | -11.7868 | 1.0 | 15.0878 | -103.4970 | -41.5305 | -2.5189 | -2.5328 | | 0.0 | 2.4 | 450 | 0.0000 | 3.3596 | -11.8664 | 1.0 | 15.2260 | -103.6562 | -41.4132 | -2.5179 | -2.5319 | | 0.0 | 2.6667 | 500 | 0.0000 | 3.3481 | -11.9338 | 1.0 | 15.2818 | -103.7909 | -41.4363 | -2.5176 | -2.5316 | | 0.0 | 2.9333 | 550 | 0.0000 | 3.3954 | -11.9591 | 1.0 | 15.3545 | -103.8415 | -41.3415 | -2.5186 | -2.5326 | | 0.0 | 3.2 | 600 | 0.0000 | 3.4233 | -12.0436 | 1.0 | 15.4669 | -104.0106 | -41.2858 | -2.5181 | -2.5321 | | 0.0 | 3.4667 | 650 | 0.0000 | 3.4170 | -12.0535 | 1.0 | 15.4704 | -104.0303 | -41.2985 | -2.5183 | -2.5323 | | 0.0 | 3.7333 | 700 | 0.0000 | 3.3924 | -12.0736 | 1.0 | 15.4660 | -104.0705 | -41.3476 | -2.5178 | -2.5318 | | 0.0 | 4.0 | 750 | 0.0000 | 3.4428 | -12.0566 | 1.0 | 15.4994 | -104.0365 | -41.2468 | -2.5180 | -2.5321 | | 0.0 | 4.2667 | 800 | 0.0000 | 3.4331 | -12.0469 | 1.0 | 15.4800 | -104.0172 | -41.2661 | -2.5173 | -2.5314 | | 0.0 | 4.5333 | 850 | 0.0000 | 3.4177 | -12.0794 | 1.0 | 15.4970 | -104.0821 | -41.2971 | -2.5172 | -2.5312 | | 0.0 | 4.8 | 900 | 0.0000 | 3.4353 | -12.0460 | 1.0 | 15.4813 | -104.0153 | -41.2618 | -2.5171 | -2.5312 | | 0.0 | 5.0667 | 950 | 0.0000 | 3.4353 | -12.0460 | 1.0 | 15.4813 | -104.0153 | -41.2618 | -2.5171 | -2.5312 | | 0.0 | 5.3333 | 1000 | 0.0000 | 3.4353 | -12.0460 | 1.0 | 15.4813 | -104.0153 | -41.2618 | -2.5171 | -2.5312 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.4.0+cu121 - Datasets 2.21.0 - Tokenizers 0.19.1