zephyr-dpop-qlora-gpt4-5e-7-epoch3

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/GPT4 dataset. It achieves the following results on the evaluation set:

Loss: 1.5678
Positive Losses: 8.8531
Dpo Losses: 0.6658
Rewards/chosen: -0.0421
Rewards/rejected: -0.1106
Rewards/accuracies: 0.6151
Rewards/margins: 0.0684
Rewards/margins Max: 0.3178
Rewards/margins Min: -0.1570
Rewards/margins Std: 0.2114
Logps/rejected: -270.2393
Logps/chosen: -289.4331
Logits/rejected: -2.6601
Logits/chosen: -2.7033

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 16
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Positive Losses	Dpo Losses	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Rewards/margins Max	Rewards/margins Min	Rewards/margins Std	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6835	0.28	100	0.6965	0.0436	0.6917	0.0092	0.0061	0.5833	0.0030	0.0155	-0.0076	0.0103	-258.5689	-284.3059	-2.8089	-2.8541
0.6367	0.56	200	0.7633	0.6990	0.6863	0.0215	0.0070	0.5873	0.0145	0.0761	-0.0391	0.0511	-258.4836	-283.0695	-2.7779	-2.8224
0.5913	0.85	300	0.9198	2.2041	0.6810	0.0123	-0.0144	0.5714	0.0267	0.1358	-0.0683	0.0899	-260.6202	-283.9922	-2.7412	-2.7853
0.5502	1.13	400	1.0826	3.7846	0.6770	0.0010	-0.0361	0.5754	0.0370	0.1861	-0.0963	0.1243	-262.7899	-285.1261	-2.7113	-2.7545
0.5398	1.41	500	1.1571	4.6567	0.6734	0.0027	-0.0441	0.5833	0.0468	0.2338	-0.1166	0.1549	-263.5918	-284.9548	-2.6935	-2.7368
0.5293	1.69	600	1.2245	5.3740	0.6703	0.0016	-0.0536	0.5913	0.0552	0.2655	-0.1284	0.1752	-264.5410	-285.0616	-2.6767	-2.7201
0.5238	1.97	700	1.3783	6.9387	0.6683	-0.0190	-0.0800	0.6032	0.0610	0.2891	-0.1425	0.1922	-267.1869	-287.1237	-2.6726	-2.7154
0.488	2.25	800	1.4896	8.0964	0.6670	-0.0328	-0.0978	0.6111	0.0650	0.3063	-0.1511	0.2037	-268.9666	-288.5044	-2.6644	-2.7076
0.5027	2.54	900	1.5575	8.7828	0.6661	-0.0416	-0.1091	0.6190	0.0675	0.3151	-0.1563	0.2099	-270.0926	-289.3809	-2.6629	-2.7059
0.4962	2.82	1000	1.5707	8.9081	0.6660	-0.0431	-0.1110	0.6151	0.0679	0.3167	-0.1568	0.2111	-270.2825	-289.5273	-2.6606	-2.7037

Framework versions

PEFT 0.7.1
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

just1nseo
/

zephyr-dpop-qlora-gpt4-5e-7-epoch3

zephyr-dpop-qlora-gpt4-5e-7-epoch3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for just1nseo/zephyr-dpop-qlora-gpt4-5e-7-epoch3

Evaluation results