qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.5-5e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 2.2556
Logps: -80.0690
Logits: -0.6172
Objective: 2.2419
Dpo Loss: 1.3282
Regularize: 2.2419
Ranking Simple: 0.5134
Ranking Idealized: 0.5248
Ranking Idealized Expo: 0.5093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo
0.8081	0.2834	50	0.6652	-91.9364	-1.3308	0.6722	0.7266	0.6722	0.5124	0.5248	0.5093
1.4482	0.5668	100	1.4160	-83.3251	-1.0880	1.3662	0.9745	1.3662	0.5093	0.5248	0.5093
1.5063	0.8503	150	1.8403	-79.4245	-0.9764	1.8307	1.1388	1.8307	0.5155	0.5248	0.5093
1.3427	1.1337	200	1.9411	-78.0898	-0.8446	1.9042	1.1943	1.9042	0.5124	0.5248	0.5093
1.2385	1.4171	250	2.1004	-81.0783	-0.8252	2.0780	1.2812	2.0780	0.5072	0.5248	0.5093
1.1013	1.7005	300	2.1954	-78.5161	-0.6190	2.2003	1.3091	2.2003	0.5124	0.5248	0.5093
0.9795	1.9839	350	2.2001	-78.2914	-0.6908	2.1850	1.2866	2.1850	0.5093	0.5248	0.5093
0.8853	2.2674	400	2.2679	-78.5732	-0.6216	2.2619	1.3223	2.2619	0.5134	0.5248	0.5093
0.7605	2.5508	450	2.2655	-78.2840	-0.6826	2.2744	1.3572	2.2744	0.5145	0.5248	0.5093
0.6709	2.8342	500	2.2688	-79.7185	-0.6486	2.2578	1.3375	2.2578	0.5186	0.5248	0.5093
0.5302	3.1176	550	2.2598	-80.1419	-0.6267	2.2430	1.3210	2.2430	0.5196	0.5248	0.5093
0.4552	3.4010	600	2.2547	-79.9582	-0.6007	2.2379	1.3298	2.2379	0.5124	0.5248	0.5093
0.3981	3.6845	650	2.2549	-80.1880	-0.5995	2.2397	1.3238	2.2397	0.5155	0.5248	0.5093
0.3178	3.9679	700	2.2616	-80.4560	-0.6215	2.2539	1.3332	2.2539	0.5134	0.5248	0.5093
0.2213	4.2513	750	2.2620	-80.1501	-0.6154	2.2499	1.3297	2.2499	0.5134	0.5248	0.5093
0.2032	4.5347	800	2.2583	-80.1241	-0.6175	2.2455	1.3295	2.2455	0.5134	0.5248	0.5093
0.1935	4.8181	850	2.2561	-80.0661	-0.6169	2.2424	1.3284	2.2424	0.5134	0.5248	0.5093

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.5-5e6

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.5-5e6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.5-5e6

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.5-5e6

Evaluation results