zephyr-gemma-2-9b-dpo-4k

This model is a fine-tuned version of models/zephyr-gemma-2-9b-sft-4k on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5439
Rewards/chosen: -0.9090
Rewards/rejected: -1.4174
Rewards/accuracies: 0.6720
Rewards/margins: 0.5084
Logps/rejected: -488.0301
Logps/chosen: -459.7270
Logits/rejected: -11.0859
Logits/chosen: -11.3601

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-07
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 16
total_train_batch_size: 128
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6273	0.2094	100	0.6256	-0.2308	-0.3911	0.6800	0.1603	-385.3994	-391.9051	-8.7457	-9.0867
0.5701	0.4187	200	0.5679	-0.7323	-1.1248	0.6800	0.3925	-458.7671	-442.0617	-10.9305	-11.1970
0.5398	0.6281	300	0.5491	-0.8992	-1.3693	0.6840	0.4700	-483.2173	-458.7530	-11.3540	-11.5217
0.54	0.8375	400	0.5449	-0.9048	-1.4075	0.6760	0.5028	-487.0408	-459.3047	-11.1294	-11.3851

Framework versions

Transformers 4.45.0.dev0
Pytorch 2.4.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

tanliboy
/

zephyr-gemma-2-9b-dpo-4k

zephyr-gemma-2-9b-dpo-4k

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train tanliboy/zephyr-gemma-2-9b-dpo-4k

Evaluation results