yakazimir's picture
End of training
b7b361b verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_uncCPO_entropy_0_01
    results: []

qwen_uncCPO_entropy_0_01

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0500
  • Sft Loss: 3.9220
  • Rewards/chosen: -4.3252
  • Rewards/rejected: -5.1044
  • Rewards/accuracies: 0.6892
  • Rewards/margins: 0.7793
  • Logps/rejected: -5.1044
  • Logps/chosen: -4.3252
  • Logits/rejected: 0.1444
  • Logits/chosen: 0.0509

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Sft Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0563 0.2141 400 0.0573 4.8352 -5.7454 -6.0246 0.5445 0.2792 -6.0246 -5.7454 0.6512 0.5372
0.0533 0.4282 800 0.0524 4.2340 -4.6954 -5.0777 0.6157 0.3823 -5.0777 -4.6954 0.2939 0.1644
0.0533 0.6422 1200 0.0518 4.1504 -4.5198 -5.0186 0.6484 0.4989 -5.0186 -4.5198 0.4014 0.2684
0.0508 0.8563 1600 0.0512 4.0690 -4.5220 -5.0081 0.6491 0.4862 -5.0081 -4.5220 0.2498 0.1344
0.0529 1.0704 2000 0.0508 3.9195 -4.3917 -4.9646 0.6521 0.5729 -4.9646 -4.3917 0.3268 0.2181
0.0522 1.2845 2400 0.0504 4.1797 -4.6133 -5.2771 0.6647 0.6638 -5.2771 -4.6133 0.2727 0.1622
0.0515 1.4986 2800 0.0504 4.0933 -4.4442 -5.0786 0.6825 0.6344 -5.0786 -4.4442 0.2050 0.0984
0.0526 1.7127 3200 0.0503 4.0886 -4.4943 -5.1537 0.6751 0.6594 -5.1537 -4.4943 0.2002 0.0920
0.0533 1.9267 3600 0.0501 3.9857 -4.3809 -5.1003 0.6825 0.7195 -5.1003 -4.3809 0.1348 0.0421
0.0493 2.1408 4000 0.0500 3.9751 -4.3954 -5.1537 0.6840 0.7583 -5.1537 -4.3954 0.3029 0.1980
0.0522 2.3549 4400 0.0500 3.9820 -4.4013 -5.1632 0.6869 0.7619 -5.1632 -4.4013 0.2139 0.1131
0.0513 2.5690 4800 0.0500 3.9732 -4.3709 -5.1160 0.6944 0.7451 -5.1160 -4.3709 0.1787 0.0785
0.0498 2.7831 5200 0.0500 3.9372 -4.3318 -5.0969 0.6892 0.7651 -5.0969 -4.3318 0.2138 0.1134
0.0496 2.9972 5600 0.0500 3.9220 -4.3252 -5.1044 0.6892 0.7793 -5.1044 -4.3252 0.1444 0.0509

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1