tsavage68's picture
End of training
98cd596 verified
metadata
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Na_M2_1000steps_1e8rate_01beta_cSFTDPO
    results: []

Na_M2_1000steps_1e8rate_01beta_cSFTDPO

This model is a fine-tuned version of tsavage68/Na_M2_1000steps_1e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6023
  • Rewards/chosen: 0.0529
  • Rewards/rejected: -0.1392
  • Rewards/accuracies: 1.0
  • Rewards/margins: 0.1921
  • Logps/rejected: -81.3154
  • Logps/chosen: -47.6033
  • Logits/rejected: -2.5345
  • Logits/chosen: -2.5471

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6929 0.2667 50 0.6931 -0.0010 -0.0014 0.5700 0.0003 -79.9371 -48.1427 -2.5355 -2.5481
0.6881 0.5333 100 0.6832 0.0062 -0.0142 0.6900 0.0204 -80.0656 -48.0704 -2.5357 -2.5482
0.6652 0.8 150 0.6568 0.0223 -0.0526 0.9500 0.0748 -80.4490 -47.9098 -2.5356 -2.5482
0.6475 1.0667 200 0.6389 0.0327 -0.0794 1.0 0.1121 -80.7177 -47.8054 -2.5355 -2.5481
0.6224 1.3333 250 0.6217 0.0389 -0.1104 1.0 0.1492 -81.0270 -47.7436 -2.5352 -2.5477
0.6068 1.6 300 0.6115 0.0553 -0.1167 1.0 0.1720 -81.0905 -47.5798 -2.5353 -2.5478
0.6018 1.8667 350 0.6041 0.0523 -0.1359 1.0 0.1882 -81.2823 -47.6092 -2.5345 -2.5471
0.5976 2.1333 400 0.6021 0.0543 -0.1384 1.0 0.1927 -81.3072 -47.5892 -2.5349 -2.5474
0.5952 2.4 450 0.5993 0.0581 -0.1408 1.0 0.1990 -81.3318 -47.5512 -2.5343 -2.5468
0.6013 2.6667 500 0.6022 0.0541 -0.1384 1.0 0.1925 -81.3071 -47.5913 -2.5347 -2.5472
0.5981 2.9333 550 0.6027 0.0571 -0.1340 1.0 0.1911 -81.2633 -47.5610 -2.5348 -2.5473
0.6006 3.2 600 0.6009 0.0589 -0.1365 1.0 0.1954 -81.2883 -47.5433 -2.5347 -2.5473
0.5961 3.4667 650 0.6036 0.0539 -0.1354 1.0 0.1893 -81.2771 -47.5931 -2.5350 -2.5476
0.5896 3.7333 700 0.6024 0.0550 -0.1368 1.0 0.1918 -81.2913 -47.5819 -2.5345 -2.5471
0.593 4.0 750 0.6023 0.0529 -0.1392 1.0 0.1921 -81.3154 -47.6033 -2.5345 -2.5471
0.603 4.2667 800 0.6023 0.0529 -0.1392 1.0 0.1921 -81.3154 -47.6033 -2.5345 -2.5471
0.5989 4.5333 850 0.6023 0.0529 -0.1392 1.0 0.1921 -81.3154 -47.6033 -2.5345 -2.5471
0.5879 4.8 900 0.6023 0.0529 -0.1392 1.0 0.1921 -81.3154 -47.6033 -2.5345 -2.5471
0.5949 5.0667 950 0.6023 0.0529 -0.1392 1.0 0.1921 -81.3154 -47.6033 -2.5345 -2.5471
0.5974 5.3333 1000 0.6023 0.0529 -0.1392 1.0 0.1921 -81.3154 -47.6033 -2.5345 -2.5471

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1