tsavage68's picture
End of training
7a24470 verified
|
raw
history blame
5.82 kB
metadata
library_name: transformers
license: apache-2.0
base_model: tsavage68/Na_M2_1000steps_1e7_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Na_M2_1000steps_1e8rate_03beta_cSFTDPO
    results: []

Na_M2_1000steps_1e8rate_03beta_cSFTDPO

This model is a fine-tuned version of tsavage68/Na_M2_1000steps_1e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4450
  • Rewards/chosen: 0.1680
  • Rewards/rejected: -0.4255
  • Rewards/accuracies: 1.0
  • Rewards/margins: 0.5934
  • Logps/rejected: -81.3416
  • Logps/chosen: -47.5724
  • Logits/rejected: -2.5355
  • Logits/chosen: -2.5481

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6955 0.2667 50 0.6882 0.0099 -0.0031 0.5600 0.0130 -79.9338 -48.0995 -2.5354 -2.5481
0.6761 0.5333 100 0.6730 0.0130 -0.0315 0.6600 0.0445 -80.0283 -48.0889 -2.5363 -2.5489
0.6154 0.8 150 0.5971 0.0672 -0.1393 0.9800 0.2065 -80.3878 -47.9083 -2.5367 -2.5493
0.5735 1.0667 200 0.5430 0.1029 -0.2302 1.0 0.3331 -80.6906 -47.7893 -2.5352 -2.5478
0.5047 1.3333 250 0.5020 0.1363 -0.3030 1.0 0.4393 -80.9334 -47.6779 -2.5353 -2.5478
0.4525 1.6 300 0.4751 0.1411 -0.3685 1.0 0.5096 -81.1517 -47.6622 -2.5350 -2.5476
0.451 1.8667 350 0.4572 0.1576 -0.3988 1.0 0.5564 -81.2528 -47.6072 -2.5350 -2.5475
0.4434 2.1333 400 0.4501 0.1391 -0.4387 1.0 0.5778 -81.3857 -47.6686 -2.5351 -2.5477
0.4313 2.4 450 0.4454 0.1528 -0.4370 1.0 0.5899 -81.3802 -47.6230 -2.5343 -2.5469
0.4546 2.6667 500 0.4513 0.1462 -0.4293 1.0 0.5755 -81.3544 -47.6450 -2.5345 -2.5471
0.4526 2.9333 550 0.4424 0.1917 -0.4110 1.0 0.6027 -81.2934 -47.4934 -2.5352 -2.5476
0.4426 3.2 600 0.4437 0.1805 -0.4175 1.0 0.5980 -81.3150 -47.5307 -2.5361 -2.5486
0.4452 3.4667 650 0.4403 0.1651 -0.4392 1.0 0.6043 -81.3875 -47.5821 -2.5347 -2.5473
0.418 3.7333 700 0.4450 0.1668 -0.4237 1.0 0.5905 -81.3358 -47.5764 -2.5348 -2.5474
0.4281 4.0 750 0.4450 0.1680 -0.4255 1.0 0.5934 -81.3416 -47.5724 -2.5355 -2.5481
0.4503 4.2667 800 0.4450 0.1680 -0.4255 1.0 0.5934 -81.3416 -47.5724 -2.5355 -2.5481
0.4372 4.5333 850 0.4450 0.1680 -0.4255 1.0 0.5934 -81.3416 -47.5724 -2.5355 -2.5481
0.4135 4.8 900 0.4450 0.1680 -0.4255 1.0 0.5934 -81.3416 -47.5724 -2.5355 -2.5481
0.4316 5.0667 950 0.4450 0.1680 -0.4255 1.0 0.5934 -81.3416 -47.5724 -2.5355 -2.5481
0.4438 5.3333 1000 0.4450 0.1680 -0.4255 1.0 0.5934 -81.3416 -47.5724 -2.5355 -2.5481

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1