qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-50-5e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 223.1421
Logps: -81.8519
Logits: -0.6524
Objective: 224.3911
Dpo Loss: 114.2648
Regularize: 224.3911
Ranking Simple: 0.5083
Ranking Idealized: 0.5093
Ranking Idealized Expo: 0.5093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo
72.7249	0.2834	50	49.7663	-92.8163	-1.3016	48.7266	25.8845	48.7266	0.5083	0.5093	0.5093
152.2211	0.5668	100	146.6511	-80.6726	-1.2458	149.0543	74.5413	149.0543	0.5124	0.5093	0.5093
149.0411	0.8503	150	179.0229	-81.4258	-0.9511	179.4755	89.5257	179.4755	0.5124	0.5093	0.5093
135.6758	1.1337	200	190.7774	-83.1371	-0.8760	195.4946	98.7297	195.4946	0.5083	0.5093	0.5093
122.9397	1.4171	250	204.8156	-81.1880	-0.8410	206.5414	104.7900	206.5414	0.4990	0.5093	0.5093
109.8686	1.7005	300	216.4334	-82.2344	-0.6658	216.9471	109.1882	216.9471	0.5083	0.5093	0.5093
97.6956	1.9839	350	218.2887	-81.0804	-0.6323	217.4291	109.8084	217.4291	0.5072	0.5093	0.5093
86.0309	2.2674	400	221.7113	-83.6082	-0.5904	225.3389	115.8749	225.3389	0.5052	0.5093	0.5093
78.4362	2.5508	450	221.3732	-82.0743	-0.6173	224.4839	116.2117	224.4839	0.5114	0.5093	0.5093
65.179	2.8342	500	223.8012	-82.3425	-0.6892	227.1755	114.9871	227.1755	0.5083	0.5093	0.5093
52.3116	3.1176	550	223.6770	-81.8433	-0.6290	226.7591	114.9252	226.7591	0.5103	0.5093	0.5093
45.9426	3.4010	600	222.4720	-81.3168	-0.6183	223.1873	113.6331	223.1873	0.5072	0.5093	0.5093
37.3789	3.6845	650	223.4119	-81.7013	-0.6355	225.2157	114.6103	225.2157	0.5072	0.5093	0.5093
32.7043	3.9679	700	223.5499	-81.8343	-0.6585	224.4542	114.2602	224.4542	0.5062	0.5093	0.5093
22.8627	4.2513	750	223.7742	-81.7547	-0.6564	224.6748	114.4499	224.6748	0.5072	0.5093	0.5093
19.3618	4.5347	800	223.2886	-81.8898	-0.6540	224.4371	114.3485	224.4371	0.5083	0.5093	0.5093
18.3796	4.8181	850	223.1902	-81.8524	-0.6522	224.4282	114.2867	224.4282	0.5083	0.5093	0.5093

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-50-5e6

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-50-5e6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-50-5e6

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-50-5e6

Evaluation results