metadata

license: apache-2.0
base_model: mistralai/Mistral-7B-v0.1
tags:
  - trl
  - orpo
  - alignment-handbook
  - generated_from_trainer
model-index:
  - name: zephyr-7b-sft-full-orpo
    results: []

zephyr-7b-sft-full-orpo

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5088
Rewards/chosen: -0.0404
Rewards/rejected: -0.0510
Rewards/accuracies: 0.6290
Rewards/margins: 0.0106
Logps/rejected: -1.0202
Logps/chosen: -0.8085
Logits/rejected: -2.5337
Logits/chosen: -2.5634
Nll Loss: 0.4741
Log Odds Ratio: -0.6379
Log Odds Chosen: 0.3305

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: inverse_sqrt
lr_scheduler_warmup_steps: 100
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Nll Loss	Log Odds Ratio	Log Odds Chosen
0.5707	0.1049	100	1.1268	-0.0452	-0.0539	0.6369	0.0086	-1.0774	-0.9045	-2.5432	-2.5811	1.0893	-0.6413	0.2601
0.5663	0.2098	200	0.5741	-0.0440	-0.0534	0.6270	0.0094	-1.0676	-0.8799	-2.5377	-2.5597	0.5352	-0.6447	0.2863
0.5817	0.3146	300	0.5572	-0.0440	-0.0531	0.6190	0.0091	-1.0628	-0.8808	-2.4499	-2.4818	0.5207	-0.6503	0.2780
0.5724	0.4195	400	0.5416	-0.0426	-0.0515	0.625	0.0089	-1.0293	-0.8510	-2.4026	-2.4376	0.5060	-0.6551	0.2819
0.5486	0.5244	500	0.5344	-0.0425	-0.0526	0.6151	0.0101	-1.0514	-0.8492	-2.4373	-2.4718	0.4990	-0.6439	0.3193
0.5156	0.6293	600	0.5242	-0.0417	-0.0514	0.6151	0.0098	-1.0285	-0.8333	-2.5551	-2.5811	0.4882	-0.6470	0.3056
0.5297	0.7341	700	0.5191	-0.0411	-0.0521	0.6310	0.0110	-1.0422	-0.8215	-2.4477	-2.4801	0.4838	-0.6351	0.3407
0.5184	0.8390	800	0.5138	-0.0409	-0.0532	0.6310	0.0123	-1.0647	-0.8179	-2.4575	-2.4922	0.4796	-0.6304	0.3783
0.5235	0.9439	900	0.5088	-0.0404	-0.0510	0.6290	0.0106	-1.0202	-0.8085	-2.5337	-2.5634	0.4741	-0.6379	0.3305

Framework versions

Transformers 4.41.0.dev0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1