tsavage68's picture
End of training
5186abf verified
metadata
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Na_M2_1000steps_1e8rate_05beta_cSFTDPO
    results: []

Na_M2_1000steps_1e8rate_05beta_cSFTDPO

This model is a fine-tuned version of tsavage68/Na_M2_1000steps_1e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3285
  • Rewards/chosen: 0.2983
  • Rewards/rejected: -0.6864
  • Rewards/accuracies: 1.0
  • Rewards/margins: 0.9847
  • Logps/rejected: -81.2962
  • Logps/chosen: -47.5358
  • Logits/rejected: -2.5349
  • Logits/chosen: -2.5474

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6915 0.2667 50 0.6994 0.0060 0.0118 0.5400 -0.0058 -79.8998 -48.1204 -2.5353 -2.5479
0.6635 0.5333 100 0.6459 0.0371 -0.0697 0.7100 0.1068 -80.0629 -48.0583 -2.5354 -2.5480
0.5585 0.8 150 0.5484 0.1041 -0.2242 0.9400 0.3283 -80.3718 -47.9241 -2.5344 -2.5470
0.5041 1.0667 200 0.4568 0.1548 -0.4106 1.0 0.5654 -80.7446 -47.8228 -2.5349 -2.5475
0.4012 1.3333 250 0.3983 0.2253 -0.5152 1.0 0.7405 -80.9538 -47.6818 -2.5354 -2.5479
0.3304 1.6 300 0.3692 0.2306 -0.6109 1.0 0.8415 -81.1452 -47.6712 -2.5346 -2.5472
0.3396 1.8667 350 0.3524 0.2373 -0.6582 1.0 0.8955 -81.2397 -47.6578 -2.5349 -2.5474
0.3311 2.1333 400 0.3304 0.2656 -0.7177 1.0 0.9834 -81.3589 -47.6011 -2.5350 -2.5475
0.3099 2.4 450 0.3378 0.2807 -0.6665 1.0 0.9472 -81.2563 -47.5710 -2.5361 -2.5486
0.3384 2.6667 500 0.3271 0.2743 -0.7151 1.0 0.9894 -81.3535 -47.5838 -2.5349 -2.5474
0.3381 2.9333 550 0.3284 0.2854 -0.7005 1.0 0.9859 -81.3243 -47.5616 -2.5347 -2.5472
0.3328 3.2 600 0.3217 0.2963 -0.7183 1.0 1.0146 -81.3600 -47.5398 -2.5349 -2.5474
0.3162 3.4667 650 0.3252 0.3046 -0.6916 1.0 0.9962 -81.3066 -47.5232 -2.5358 -2.5483
0.2907 3.7333 700 0.3331 0.3002 -0.6711 1.0 0.9713 -81.2656 -47.5319 -2.5350 -2.5475
0.3052 4.0 750 0.3279 0.2998 -0.6877 1.0 0.9875 -81.2988 -47.5328 -2.5350 -2.5474
0.3264 4.2667 800 0.3285 0.2983 -0.6864 1.0 0.9847 -81.2962 -47.5358 -2.5349 -2.5474
0.3196 4.5333 850 0.3285 0.2983 -0.6864 1.0 0.9847 -81.2962 -47.5358 -2.5349 -2.5474
0.2962 4.8 900 0.3285 0.2983 -0.6864 1.0 0.9847 -81.2962 -47.5358 -2.5349 -2.5474
0.3115 5.0667 950 0.3285 0.2983 -0.6864 1.0 0.9847 -81.2962 -47.5358 -2.5349 -2.5474
0.3285 5.3333 1000 0.3285 0.2983 -0.6864 1.0 0.9847 -81.2962 -47.5358 -2.5349 -2.5474

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1