---
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
- alignment-handbook
- trl
- simpo
- generated_from_trainer
- trl
- simpo
- generated_from_trainer
datasets:
- yakazimir/ultrafeedback_binarized
model-index:
- name: qwen_cpo_entropy_0_3
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# qwen_cpo_entropy_0_3

This model is a fine-tuned version of [trl-lib/qwen1.5-0.5b-sft](https://huggingface.co/trl-lib/qwen1.5-0.5b-sft) on the yakazimir/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 1.0416
- Sft Loss: 1.4031
- Rewards/chosen: -1.3990
- Rewards/rejected: -1.8440
- Rewards/accuracies: 0.6157
- Rewards/margins: 0.4450
- Logps/rejected: -1.8440
- Logps/chosen: -1.3990
- Logits/rejected: 0.2187
- Logits/chosen: 0.1269

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3.0

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Sft Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.09          | 0.2141 | 400  | 1.1010          | 1.3681   | -1.3477        | -1.4855          | 0.5586             | 0.1378          | -1.4855        | -1.3477      | 0.3207          | 0.2350        |
| 1.0764        | 0.4282 | 800  | 1.0739          | 1.3759   | -1.3603        | -1.5873          | 0.5823             | 0.2270          | -1.5873        | -1.3603      | 0.3806          | 0.2884        |
| 1.077         | 0.6422 | 1200 | 1.0591          | 1.3822   | -1.3685        | -1.6704          | 0.5935             | 0.3019          | -1.6704        | -1.3685      | 0.3589          | 0.2649        |
| 1.0489        | 0.8563 | 1600 | 1.0555          | 1.3767   | -1.3518        | -1.6477          | 0.5905             | 0.2959          | -1.6477        | -1.3518      | 0.4297          | 0.3293        |
| 1.1366        | 1.0704 | 2000 | 1.0496          | 1.3798   | -1.3555        | -1.7040          | 0.5987             | 0.3484          | -1.7040        | -1.3555      | 0.3416          | 0.2453        |
| 1.0133        | 1.2845 | 2400 | 1.0461          | 1.3864   | -1.3639        | -1.7321          | 0.6053             | 0.3682          | -1.7321        | -1.3639      | 0.3701          | 0.2708        |
| 1.1144        | 1.4986 | 2800 | 1.0443          | 1.3887   | -1.3652        | -1.7447          | 0.6105             | 0.3794          | -1.7447        | -1.3652      | 0.2150          | 0.1278        |
| 1.0196        | 1.7127 | 3200 | 1.0449          | 1.3841   | -1.3615        | -1.7338          | 0.6142             | 0.3723          | -1.7338        | -1.3615      | 0.1872          | 0.1007        |
| 1.0023        | 1.9267 | 3600 | 1.0405          | 1.3927   | -1.3767        | -1.7830          | 0.6120             | 0.4063          | -1.7830        | -1.3767      | 0.2211          | 0.1322        |
| 0.9654        | 2.1408 | 4000 | 1.0418          | 1.3967   | -1.3910        | -1.8183          | 0.6180             | 0.4273          | -1.8183        | -1.3910      | 0.2405          | 0.1482        |
| 0.9676        | 2.3549 | 4400 | 1.0418          | 1.4054   | -1.4061        | -1.8540          | 0.6231             | 0.4479          | -1.8540        | -1.4061      | 0.2064          | 0.1158        |
| 0.9789        | 2.5690 | 4800 | 1.0420          | 1.4009   | -1.3974        | -1.8380          | 0.6142             | 0.4406          | -1.8380        | -1.3974      | 0.1887          | 0.0996        |
| 1.0003        | 2.7831 | 5200 | 1.0413          | 1.4027   | -1.3986        | -1.8438          | 0.6187             | 0.4452          | -1.8438        | -1.3986      | 0.2046          | 0.1137        |
| 0.9909        | 2.9972 | 5600 | 1.0416          | 1.4031   | -1.3990        | -1.8440          | 0.6157             | 0.4450          | -1.8440        | -1.3990      | 0.2187          | 0.1269        |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.19.1