qwen_l21_entropy_0_01

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.6901
Sft Loss: 2.1331
Rewards/chosen: -2.1707
Rewards/rejected: -3.2270
Rewards/accuracies: 0.6914
Rewards/margins: 1.0563
Logps/rejected: -3.2270
Logps/chosen: -2.1707
Logits/rejected: 0.2151
Logits/chosen: 0.1185

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Sft Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7149	0.2141	400	0.7232	2.1337	-3.3125	-3.5682	0.5200	0.2557	-3.5682	-3.3125	0.5534	0.4407
0.7105	0.4282	800	0.7055	2.1066	-2.2353	-2.7243	0.6447	0.4890	-2.7243	-2.2353	0.3870	0.2857
0.7071	0.6422	1200	0.6988	2.0445	-2.1363	-2.7640	0.6691	0.6278	-2.7640	-2.1363	0.6763	0.5552
0.6909	0.8563	1600	0.6951	2.2316	-2.3067	-3.0785	0.6825	0.7718	-3.0785	-2.3067	0.0414	-0.0345
0.6992	1.0704	2000	0.6927	2.0672	-2.1384	-2.9634	0.6766	0.8250	-2.9634	-2.1384	0.1253	0.0374
0.6894	1.2845	2400	0.6908	2.1132	-2.1527	-3.0987	0.6810	0.9460	-3.0987	-2.1527	0.3470	0.2424
0.6881	1.4986	2800	0.6908	2.1384	-2.2307	-3.1888	0.6862	0.9581	-3.1888	-2.2307	0.5238	0.4064
0.6998	1.7127	3200	0.6900	2.1093	-2.1719	-3.1258	0.6936	0.9539	-3.1258	-2.1719	0.2688	0.1694
0.6837	1.9267	3600	0.6898	2.1422	-2.2075	-3.2094	0.6966	1.0019	-3.2094	-2.2075	0.3036	0.1996
0.6446	2.1408	4000	0.6902	2.1614	-2.1867	-3.2140	0.6855	1.0273	-3.2140	-2.1867	0.2205	0.1222
0.6694	2.3549	4400	0.6887	2.1145	-2.1590	-3.1865	0.6921	1.0275	-3.1865	-2.1590	0.2474	0.1483
0.6722	2.5690	4800	0.6902	2.1289	-2.1610	-3.2026	0.6907	1.0415	-3.2026	-2.1610	0.2232	0.1258
0.6701	2.7831	5200	0.6904	2.1329	-2.1699	-3.2263	0.6929	1.0564	-3.2263	-2.1699	0.2407	0.1420
0.659	2.9972	5600	0.6901	2.1331	-2.1707	-3.2271	0.6914	1.0563	-3.2271	-2.1707	0.2151	0.1185

Framework versions

Transformers 4.44.2
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

yakazimir
/

qwen_l21_entropy_0_01

qwen_l21_entropy_0_01

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for yakazimir/qwen_l21_entropy_0_01

Dataset used to train yakazimir/qwen_l21_entropy_0_01

Evaluation results