tsavage68's picture
End of training
6906f20 verified
metadata
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Na_M2_1000steps_1e7rate_05beta_cSFTDPO
    results: []

Na_M2_1000steps_1e7rate_05beta_cSFTDPO

This model is a fine-tuned version of tsavage68/Na_M2_1000steps_1e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: 3.4353
  • Rewards/rejected: -12.0460
  • Rewards/accuracies: 1.0
  • Rewards/margins: 15.4813
  • Logps/rejected: -104.0153
  • Logps/chosen: -41.2618
  • Logits/rejected: -2.5171
  • Logits/chosen: -2.5312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 0.2667 50 0.0000 2.4333 -8.7946 1.0 11.2279 -97.5125 -43.2658 -2.5259 -2.5391
0.0 0.5333 100 0.0000 2.7977 -9.9936 1.0 12.7913 -99.9105 -42.5369 -2.5223 -2.5359
0.0 0.8 150 0.0000 2.9419 -10.6551 1.0 13.5970 -101.2335 -42.2486 -2.5210 -2.5347
0.0 1.0667 200 0.0000 3.0397 -10.9989 1.0 14.0386 -101.9212 -42.0530 -2.5209 -2.5347
0.0 1.3333 250 0.0000 3.1479 -11.2365 1.0 14.3844 -102.3963 -41.8365 -2.5209 -2.5348
0.0 1.6 300 0.0000 3.1788 -11.4604 1.0 14.6393 -102.8442 -41.7747 -2.5197 -2.5337
0.0 1.8667 350 0.0000 3.2803 -11.6306 1.0 14.9109 -103.1846 -41.5718 -2.5199 -2.5339
0.0 2.1333 400 0.0000 3.3009 -11.7868 1.0 15.0878 -103.4970 -41.5305 -2.5189 -2.5328
0.0 2.4 450 0.0000 3.3596 -11.8664 1.0 15.2260 -103.6562 -41.4132 -2.5179 -2.5319
0.0 2.6667 500 0.0000 3.3481 -11.9338 1.0 15.2818 -103.7909 -41.4363 -2.5176 -2.5316
0.0 2.9333 550 0.0000 3.3954 -11.9591 1.0 15.3545 -103.8415 -41.3415 -2.5186 -2.5326
0.0 3.2 600 0.0000 3.4233 -12.0436 1.0 15.4669 -104.0106 -41.2858 -2.5181 -2.5321
0.0 3.4667 650 0.0000 3.4170 -12.0535 1.0 15.4704 -104.0303 -41.2985 -2.5183 -2.5323
0.0 3.7333 700 0.0000 3.3924 -12.0736 1.0 15.4660 -104.0705 -41.3476 -2.5178 -2.5318
0.0 4.0 750 0.0000 3.4428 -12.0566 1.0 15.4994 -104.0365 -41.2468 -2.5180 -2.5321
0.0 4.2667 800 0.0000 3.4331 -12.0469 1.0 15.4800 -104.0172 -41.2661 -2.5173 -2.5314
0.0 4.5333 850 0.0000 3.4177 -12.0794 1.0 15.4970 -104.0821 -41.2971 -2.5172 -2.5312
0.0 4.8 900 0.0000 3.4353 -12.0460 1.0 15.4813 -104.0153 -41.2618 -2.5171 -2.5312
0.0 5.0667 950 0.0000 3.4353 -12.0460 1.0 15.4813 -104.0153 -41.2618 -2.5171 -2.5312
0.0 5.3333 1000 0.0000 3.4353 -12.0460 1.0 15.4813 -104.0153 -41.2618 -2.5171 -2.5312

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1