Llama2-7b-sft-chat-custom-template-dpo

This model is a fine-tuned version of elichen3051/llama2-7b-sft-chat-no-template on the HuggingFaceH4/ultrafeedback_binarized, the HuggingFaceH4/orca_dpo_pairs and the HuggingFaceH4/cai-conversation-harmless datasets. It achieves the following results on the evaluation set:

Loss: 0.4717
Rewards/chosen: -1.6807
Rewards/rejected: -3.1957
Rewards/accuracies: 0.6345
Rewards/margins: 1.5150
Logps/rejected: -519.5196
Logps/chosen: -379.2986
Logits/rejected: -2.7275
Logits/chosen: -2.7213

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 7
gradient_accumulation_steps: 8
total_train_batch_size: 448
total_eval_batch_size: 56
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6727	0.2032	43	0.6714	-0.0530	-0.0999	0.5871	0.0470	-209.9431	-216.5270	-2.2167	-2.2006
0.6056	0.4064	86	0.6041	-0.5876	-0.8878	0.6023	0.3002	-288.7347	-269.9940	-3.0277	-3.0177
0.573	0.6096	129	0.5451	-0.9286	-1.6015	0.6174	0.6729	-360.0960	-304.0913	-2.9301	-2.9238
0.5239	0.8128	172	0.5123	-1.2863	-2.2358	0.6288	0.9495	-423.5324	-339.8588	-2.9884	-2.9803
0.4668	1.0159	215	0.4945	-1.4994	-2.6377	0.6439	1.1383	-463.7195	-361.1752	-2.5910	-2.5843
0.4607	1.2191	258	0.4816	-1.5810	-2.8887	0.6402	1.3077	-488.8177	-369.3280	-2.8026	-2.7951
0.5068	1.4223	301	0.4764	-1.5805	-3.0061	0.6402	1.4256	-500.5590	-369.2790	-2.7586	-2.7513
0.4724	1.6255	344	0.4730	-1.6832	-3.1741	0.6383	1.4909	-517.3631	-379.5493	-2.6296	-2.6237
0.4836	1.8287	387	0.4718	-1.6795	-3.1900	0.6420	1.5105	-518.9514	-379.1832	-2.6434	-2.6374

Framework versions

Transformers 4.42.0.dev0
Pytorch 2.3.1
Datasets 2.19.2
Tokenizers 0.19.1

skymizer
/

Llama2-7b-sft-chat-custom-template-dpo

Llama2-7b-sft-chat-custom-template-dpo

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for skymizer/Llama2-7b-sft-chat-custom-template-dpo

Datasets used to train skymizer/Llama2-7b-sft-chat-custom-template-dpo

Evaluation results