metadata
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Na_M2_1000steps_1e7rate_05beta_cSFTDPO
results: []
Na_M2_1000steps_1e7rate_05beta_cSFTDPO
This model is a fine-tuned version of tsavage68/Na_M2_1000steps_1e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: 3.4353
- Rewards/rejected: -12.0460
- Rewards/accuracies: 1.0
- Rewards/margins: 15.4813
- Logps/rejected: -104.0153
- Logps/chosen: -41.2618
- Logits/rejected: -2.5171
- Logits/chosen: -2.5312
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.0 | 0.2667 | 50 | 0.0000 | 2.4333 | -8.7946 | 1.0 | 11.2279 | -97.5125 | -43.2658 | -2.5259 | -2.5391 |
0.0 | 0.5333 | 100 | 0.0000 | 2.7977 | -9.9936 | 1.0 | 12.7913 | -99.9105 | -42.5369 | -2.5223 | -2.5359 |
0.0 | 0.8 | 150 | 0.0000 | 2.9419 | -10.6551 | 1.0 | 13.5970 | -101.2335 | -42.2486 | -2.5210 | -2.5347 |
0.0 | 1.0667 | 200 | 0.0000 | 3.0397 | -10.9989 | 1.0 | 14.0386 | -101.9212 | -42.0530 | -2.5209 | -2.5347 |
0.0 | 1.3333 | 250 | 0.0000 | 3.1479 | -11.2365 | 1.0 | 14.3844 | -102.3963 | -41.8365 | -2.5209 | -2.5348 |
0.0 | 1.6 | 300 | 0.0000 | 3.1788 | -11.4604 | 1.0 | 14.6393 | -102.8442 | -41.7747 | -2.5197 | -2.5337 |
0.0 | 1.8667 | 350 | 0.0000 | 3.2803 | -11.6306 | 1.0 | 14.9109 | -103.1846 | -41.5718 | -2.5199 | -2.5339 |
0.0 | 2.1333 | 400 | 0.0000 | 3.3009 | -11.7868 | 1.0 | 15.0878 | -103.4970 | -41.5305 | -2.5189 | -2.5328 |
0.0 | 2.4 | 450 | 0.0000 | 3.3596 | -11.8664 | 1.0 | 15.2260 | -103.6562 | -41.4132 | -2.5179 | -2.5319 |
0.0 | 2.6667 | 500 | 0.0000 | 3.3481 | -11.9338 | 1.0 | 15.2818 | -103.7909 | -41.4363 | -2.5176 | -2.5316 |
0.0 | 2.9333 | 550 | 0.0000 | 3.3954 | -11.9591 | 1.0 | 15.3545 | -103.8415 | -41.3415 | -2.5186 | -2.5326 |
0.0 | 3.2 | 600 | 0.0000 | 3.4233 | -12.0436 | 1.0 | 15.4669 | -104.0106 | -41.2858 | -2.5181 | -2.5321 |
0.0 | 3.4667 | 650 | 0.0000 | 3.4170 | -12.0535 | 1.0 | 15.4704 | -104.0303 | -41.2985 | -2.5183 | -2.5323 |
0.0 | 3.7333 | 700 | 0.0000 | 3.3924 | -12.0736 | 1.0 | 15.4660 | -104.0705 | -41.3476 | -2.5178 | -2.5318 |
0.0 | 4.0 | 750 | 0.0000 | 3.4428 | -12.0566 | 1.0 | 15.4994 | -104.0365 | -41.2468 | -2.5180 | -2.5321 |
0.0 | 4.2667 | 800 | 0.0000 | 3.4331 | -12.0469 | 1.0 | 15.4800 | -104.0172 | -41.2661 | -2.5173 | -2.5314 |
0.0 | 4.5333 | 850 | 0.0000 | 3.4177 | -12.0794 | 1.0 | 15.4970 | -104.0821 | -41.2971 | -2.5172 | -2.5312 |
0.0 | 4.8 | 900 | 0.0000 | 3.4353 | -12.0460 | 1.0 | 15.4813 | -104.0153 | -41.2618 | -2.5171 | -2.5312 |
0.0 | 5.0667 | 950 | 0.0000 | 3.4353 | -12.0460 | 1.0 | 15.4813 | -104.0153 | -41.2618 | -2.5171 | -2.5312 |
0.0 | 5.3333 | 1000 | 0.0000 | 3.4353 | -12.0460 | 1.0 | 15.4813 | -104.0153 | -41.2618 | -2.5171 | -2.5312 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1