NicholasCorrado's picture
End of training
30a9f05 verified
metadata
library_name: transformers
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - data/zephyr_uf_rlced_conifer_ref
model-index:
  - name: zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01-1e
    results: []

zephyr-7b-uf-rlced-conifer-group-dpo-2e-alr-0.01-1e

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the data/zephyr_uf_rlced_conifer_ref dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2572
  • Rewards/chosen: -2.2030
  • Rewards/rejected: -5.8511
  • Rewards/accuracies: 0.8675
  • Rewards/margins: 3.6481
  • Logps/rejected: -988.8447
  • Logps/chosen: -612.7692
  • Logits/rejected: 2.2087
  • Logits/chosen: 0.2455
  • Excess Loss: 0.0532
  • Alpha 0 Uf: 0.6287
  • Alpha 1 Rlced Conifer: 0.3713
  • Rewards/chosen 1 Rlced Conifer: -2.2869
  • Rewards/rejected 1 Rlced Conifer: -6.6795
  • Rewards/accuracies 1 Rlced Conifer: 0.9030
  • Rewards/margins 1 Rlced Conifer: 4.3926
  • Logps/rejected 1 Rlced Conifer: -1115.4857
  • Logps/chosen 1 Rlced Conifer: -652.2682
  • Logits/rejected 1 Rlced Conifer: 2.0086
  • Logits/chosen 1 Rlced Conifer: -0.0625
  • Task Loss 1 Rlced Conifer: 0.1962
  • Task Excess Loss 1 Rlced Conifer: 0.0645
  • Rewards/chosen 0 Uf: -1.8688
  • Rewards/rejected 0 Uf: -2.8942
  • Rewards/accuracies 0 Uf: 0.7397
  • Rewards/margins 0 Uf: 1.0254
  • Logps/rejected 0 Uf: -531.0295
  • Logps/chosen 0 Uf: -476.1427
  • Logits/rejected 0 Uf: 3.1191
  • Logits/chosen 0 Uf: 1.2438
  • Task Loss 0 Uf: 0.5240
  • Task Excess Loss 0 Uf: 0.0664

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Excess Loss Alpha 0 Uf Alpha 1 Rlced Conifer Rewards/chosen 1 Rlced Conifer Rewards/rejected 1 Rlced Conifer Rewards/accuracies 1 Rlced Conifer Rewards/margins 1 Rlced Conifer Logps/rejected 1 Rlced Conifer Logps/chosen 1 Rlced Conifer Logits/rejected 1 Rlced Conifer Logits/chosen 1 Rlced Conifer Task Loss 1 Rlced Conifer Task Excess Loss 1 Rlced Conifer Rewards/chosen 0 Uf Rewards/rejected 0 Uf Rewards/accuracies 0 Uf Rewards/margins 0 Uf Logps/rejected 0 Uf Logps/chosen 0 Uf Logits/rejected 0 Uf Logits/chosen 0 Uf Task Loss 0 Uf Task Excess Loss 0 Uf
0.1859 0.2498 180 0.2923 -1.9944 -4.7058 0.8524 2.7115 -874.3204 -591.9084 0.9002 -0.1249 0.0816 0.4921 0.5079 -2.0854 -5.3481 0.8866 3.2627 -982.3445 -632.1208 0.7201 -0.3404 0.2278 0.0919 -1.6415 -2.4124 0.7158 0.7709 -482.8499 -453.4123 1.6757 0.5476 0.5480 0.1058
0.1646 0.4997 360 0.2654 -2.3703 -5.8263 0.8637 3.4560 -986.3652 -629.4960 1.5662 -0.2281 0.0630 0.5888 0.4112 -2.4491 -6.6305 0.8988 4.1814 -1110.5859 -668.4894 1.4570 -0.4928 0.2047 0.0719 -2.0521 -2.9610 0.7379 0.9089 -537.7054 -494.4703 2.1435 0.6282 0.5444 0.0878
0.162 0.7495 540 0.2603 -2.0719 -5.7198 0.8637 3.6479 -975.7140 -599.6583 1.8052 -0.3472 0.0563 0.6201 0.3799 -2.1783 -6.5775 0.9020 4.3992 -1105.2861 -641.4061 1.6728 -0.6324 0.1991 0.0667 -1.6637 -2.6673 0.7294 1.0036 -508.3393 -455.6315 2.4641 0.5657 0.5322 0.0717
0.1476 0.9993 720 0.2572 -2.2030 -5.8511 0.8675 3.6481 -988.8447 -612.7692 2.2087 0.2455 0.0532 0.6287 0.3713 -2.2869 -6.6795 0.9030 4.3926 -1115.4857 -652.2682 2.0086 -0.0625 0.1962 0.0645 -1.8688 -2.8942 0.7397 1.0254 -531.0295 -476.1427 3.1191 1.2438 0.5240 0.0664

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.0a0+81ea7a4
  • Datasets 2.21.0
  • Tokenizers 0.19.1