Llama-3.1-8B-Instruct-KTO-400

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_400 dataset. It achieves the following results on the evaluation set:

Loss: 0.2541
Rewards/chosen: 0.0309
Logps/chosen: -16.8498
Logits/chosen: -5221032.2286
Rewards/rejected: -4.3105
Logps/rejected: -62.7444
Logits/rejected: -5284203.3778
Rewards/margins: 4.3414
Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Logits/chosen	Rewards/rejected	Logps/rejected	Logits/rejected	Rewards/margins
0.4989	1.1111	50	0.4995	0.0293	-16.8661	-6615963.4286	0.0230	-19.4096	-7226443.3778	0.0062	3.9760
0.4593	2.2222	100	0.4670	0.3125	-14.0338	-6310909.2571	0.0149	-19.4909	-7175433.9556	0.2976	7.8719
0.3701	3.3333	150	0.3606	0.2798	-14.3610	-5641927.3143	-0.9773	-29.4130	-6731665.7778	1.2571	0.0
0.281	4.4444	200	0.3004	0.1701	-15.4577	-5451904.9143	-2.0389	-40.0285	-6268727.4667	2.2090	0.0
0.2051	5.5556	250	0.2740	0.1961	-15.1974	-5351382.8571	-2.8411	-48.0507	-5877686.7556	3.0372	0.0
0.2724	6.6667	300	0.2628	0.1057	-16.1019	-5272125.2571	-3.6711	-56.3511	-5524427.3778	3.7768	0.0
0.2237	7.7778	350	0.2569	0.0482	-16.6771	-5216298.0571	-4.1487	-61.1272	-5306349.8667	4.1969	0.0
0.2291	8.8889	400	0.2548	0.0426	-16.7332	-5214656.9143	-4.2796	-62.4359	-5268033.4222	4.3222	0.0
0.1677	10.0	450	0.2541	0.0309	-16.8498	-5221032.2286	-4.3105	-62.7444	-5284203.3778	4.3414	0.0

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-400

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1259)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard