qwen_fUNL_entropy_0_01

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.0504
Sft Loss: 4.0281
Rewards/chosen: -4.4231
Rewards/rejected: -5.1418
Rewards/accuracies: 0.6862
Rewards/margins: 0.7187
Logps/rejected: -5.1418
Logps/chosen: -4.4231
Logits/rejected: -0.2955
Logits/chosen: -0.3687

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Sft Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.0548	0.2141	400	0.0557	4.8295	-5.3467	-5.4723	0.5326	0.1256	-5.4723	-5.3467	0.1095	-0.0277
0.0537	0.4282	800	0.0529	4.1330	-4.6614	-4.9903	0.6024	0.3289	-4.9903	-4.6614	0.2188	0.0763
0.0545	0.6422	1200	0.0523	4.2856	-4.6580	-5.0486	0.6350	0.3906	-5.0486	-4.6580	0.0914	-0.0257
0.0518	0.8563	1600	0.0519	4.0636	-4.5007	-4.9176	0.6313	0.4169	-4.9176	-4.5007	0.0782	-0.0290
0.0537	1.0704	2000	0.0517	3.9662	-4.4270	-4.8924	0.6469	0.4654	-4.8924	-4.4270	-0.1550	-0.2400
0.0533	1.2845	2400	0.0514	4.4069	-4.8229	-5.4257	0.6632	0.6028	-5.4257	-4.8229	-0.1556	-0.2460
0.0522	1.4986	2800	0.0511	4.2244	-4.5446	-5.1374	0.6803	0.5928	-5.1374	-4.5446	-0.2984	-0.3849
0.053	1.7127	3200	0.0508	4.1193	-4.4960	-5.1073	0.6691	0.6113	-5.1073	-4.4960	-0.2032	-0.2947
0.0538	1.9267	3600	0.0505	4.0434	-4.4193	-5.0638	0.6847	0.6445	-5.0638	-4.4193	-0.2476	-0.3292
0.0504	2.1408	4000	0.0505	4.0585	-4.4646	-5.1658	0.6840	0.7011	-5.1658	-4.4646	-0.2103	-0.2919
0.053	2.3549	4400	0.0505	4.0905	-4.4767	-5.1722	0.6840	0.6956	-5.1722	-4.4767	-0.2850	-0.3632
0.0525	2.5690	4800	0.0504	4.0700	-4.4483	-5.1426	0.6832	0.6943	-5.1426	-4.4483	-0.1890	-0.2741
0.0509	2.7831	5200	0.0504	4.0135	-4.3932	-5.0993	0.6855	0.7061	-5.0993	-4.3932	-0.1516	-0.2376
0.0504	2.9972	5600	0.0504	4.0281	-4.4231	-5.1418	0.6862	0.7187	-5.1418	-4.4231	-0.2955	-0.3687

Framework versions

Transformers 4.44.2
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

yakazimir
/

qwen_fUNL_entropy_0_01

qwen_fUNL_entropy_0_01

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yakazimir/qwen_fUNL_entropy_0_01

Dataset used to train yakazimir/qwen_fUNL_entropy_0_01

Evaluation results