yakazimir's picture
End of training
47e127f verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_unl_entropy_0_0
    results: []

qwen_unl_entropy_0_0

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6479
  • Rewards/chosen: -1.3032
  • Rewards/rejected: -1.4993
  • Rewards/accuracies: 0.5712
  • Rewards/margins: 0.1961
  • Logps/rejected: -1.4993
  • Logps/chosen: -1.3032
  • Logits/rejected: 0.1464
  • Logits/chosen: 0.0748

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.6555 0.2141 400 1.6941 -1.3383 -1.4640 0.5556 0.1257 -1.4640 -1.3383 0.4030 0.3137
1.6693 0.4282 800 1.6719 -1.3149 -1.4532 0.5579 0.1383 -1.4532 -1.3149 0.3441 0.2642
1.6204 0.6422 1200 1.6640 -1.3085 -1.4525 0.5556 0.1440 -1.4525 -1.3085 0.3559 0.2746
1.6569 0.8563 1600 1.6598 -1.3094 -1.4585 0.5593 0.1491 -1.4585 -1.3094 0.2618 0.1878
1.7111 1.0704 2000 1.6548 -1.3002 -1.4570 0.5653 0.1568 -1.4570 -1.3002 0.2290 0.1561
1.6123 1.2845 2400 1.6522 -1.3029 -1.4741 0.5675 0.1711 -1.4741 -1.3029 0.2729 0.1950
1.6687 1.4986 2800 1.6488 -1.3000 -1.4737 0.5697 0.1738 -1.4737 -1.3000 0.1754 0.1051
1.6012 1.7127 3200 1.6494 -1.3010 -1.4718 0.5675 0.1708 -1.4718 -1.3010 0.1848 0.1133
1.5646 1.9267 3600 1.6479 -1.2987 -1.4776 0.5682 0.1789 -1.4776 -1.2987 0.1466 0.0770
1.5351 2.1408 4000 1.6470 -1.3020 -1.4960 0.5697 0.1940 -1.4960 -1.3020 0.1418 0.0714
1.5309 2.3549 4400 1.6467 -1.3051 -1.5042 0.5727 0.1991 -1.5042 -1.3051 0.1132 0.0439
1.5444 2.5690 4800 1.6473 -1.3034 -1.5014 0.5720 0.1979 -1.5014 -1.3034 0.1403 0.0690
1.5671 2.7831 5200 1.6474 -1.3030 -1.4996 0.5705 0.1966 -1.4996 -1.3030 0.2002 0.1244
1.5485 2.9972 5600 1.6479 -1.3031 -1.4993 0.5712 0.1961 -1.4993 -1.3031 0.1464 0.0748

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1