tsavage68's picture
End of training
9c2895c verified
|
raw
history blame
5.83 kB
metadata
license: llama3
base_model: tsavage68/Summary_L3_1000steps_1e7rate_SFT2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Summary_L3_1000steps_1e6rate_01beta_CSFTDPO
    results: []

Summary_L3_1000steps_1e6rate_01beta_CSFTDPO

This model is a fine-tuned version of tsavage68/Summary_L3_1000steps_1e7rate_SFT2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5961
  • Rewards/chosen: -0.0885
  • Rewards/rejected: -2.0984
  • Rewards/accuracies: 0.1400
  • Rewards/margins: 2.0099
  • Logps/rejected: -36.2478
  • Logps/chosen: -10.2675
  • Logits/rejected: -1.2445
  • Logits/chosen: -1.2412

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.571 0.2004 50 0.5986 0.0271 -0.6059 0.1400 0.6329 -21.3224 -9.1122 -1.1153 -1.1163
0.6585 0.4008 100 0.5962 0.0177 -1.2883 0.1400 1.3060 -28.1472 -9.2058 -1.1739 -1.1725
0.6238 0.6012 150 0.5961 -0.0262 -1.7529 0.1400 1.7267 -32.7924 -9.6448 -1.2119 -1.2094
0.6065 0.8016 200 0.5961 -0.0848 -2.0675 0.1400 1.9828 -35.9388 -10.2303 -1.2396 -1.2364
0.6238 1.0020 250 0.5961 -0.0864 -2.0702 0.1400 1.9839 -35.9662 -10.2464 -1.2401 -1.2369
0.6238 1.2024 300 0.5961 -0.0864 -2.0688 0.1400 1.9824 -35.9522 -10.2471 -1.2396 -1.2364
0.6238 1.4028 350 0.5961 -0.0866 -2.0730 0.1400 1.9864 -35.9935 -10.2485 -1.2409 -1.2378
0.5718 1.6032 400 0.5961 -0.0880 -2.0816 0.1400 1.9937 -36.0800 -10.2625 -1.2420 -1.2388
0.5892 1.8036 450 0.5961 -0.0869 -2.0872 0.1400 2.0004 -36.1360 -10.2514 -1.2428 -1.2396
0.5718 2.0040 500 0.5961 -0.0873 -2.0879 0.1400 2.0006 -36.1431 -10.2557 -1.2431 -1.2399
0.5718 2.2044 550 0.5961 -0.0872 -2.0916 0.1400 2.0044 -36.1798 -10.2553 -1.2434 -1.2402
0.5545 2.4048 600 0.5961 -0.0893 -2.0984 0.1400 2.0091 -36.2481 -10.2761 -1.2448 -1.2416
0.5199 2.6052 650 0.5961 -0.0881 -2.0960 0.1400 2.0078 -36.2235 -10.2642 -1.2437 -1.2405
0.6238 2.8056 700 0.5961 -0.0891 -2.1004 0.1400 2.0113 -36.2677 -10.2740 -1.2450 -1.2417
0.6065 3.0060 750 0.5961 -0.0879 -2.0983 0.1400 2.0104 -36.2469 -10.2615 -1.2456 -1.2423
0.6412 3.2064 800 0.5961 -0.0900 -2.1003 0.1400 2.0103 -36.2667 -10.2828 -1.2448 -1.2416
0.6585 3.4068 850 0.5961 -0.0875 -2.0997 0.1400 2.0122 -36.2604 -10.2578 -1.2456 -1.2424
0.6238 3.6072 900 0.5961 -0.0879 -2.0992 0.1400 2.0114 -36.2559 -10.2613 -1.2445 -1.2413
0.5372 3.8076 950 0.5961 -0.0884 -2.0981 0.1400 2.0097 -36.2444 -10.2669 -1.2444 -1.2412
0.6238 4.0080 1000 0.5961 -0.0885 -2.0984 0.1400 2.0099 -36.2478 -10.2675 -1.2445 -1.2412

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.20.0
  • Tokenizers 0.19.1