Llama-3.1-8B-Instruct-KTO-400
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_400 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2541
- Rewards/chosen: 0.0309
- Logps/chosen: -16.8498
- Logits/chosen: -5221032.2286
- Rewards/rejected: -4.3105
- Logps/rejected: -62.7444
- Logits/rejected: -5284203.3778
- Rewards/margins: 4.3414
- Kl: 0.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Logps/chosen | Logits/chosen | Rewards/rejected | Logps/rejected | Logits/rejected | Rewards/margins | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.4989 | 1.1111 | 50 | 0.4995 | 0.0293 | -16.8661 | -6615963.4286 | 0.0230 | -19.4096 | -7226443.3778 | 0.0062 | 3.9760 |
| 0.4593 | 2.2222 | 100 | 0.4670 | 0.3125 | -14.0338 | -6310909.2571 | 0.0149 | -19.4909 | -7175433.9556 | 0.2976 | 7.8719 |
| 0.3701 | 3.3333 | 150 | 0.3606 | 0.2798 | -14.3610 | -5641927.3143 | -0.9773 | -29.4130 | -6731665.7778 | 1.2571 | 0.0 |
| 0.281 | 4.4444 | 200 | 0.3004 | 0.1701 | -15.4577 | -5451904.9143 | -2.0389 | -40.0285 | -6268727.4667 | 2.2090 | 0.0 |
| 0.2051 | 5.5556 | 250 | 0.2740 | 0.1961 | -15.1974 | -5351382.8571 | -2.8411 | -48.0507 | -5877686.7556 | 3.0372 | 0.0 |
| 0.2724 | 6.6667 | 300 | 0.2628 | 0.1057 | -16.1019 | -5272125.2571 | -3.6711 | -56.3511 | -5524427.3778 | 3.7768 | 0.0 |
| 0.2237 | 7.7778 | 350 | 0.2569 | 0.0482 | -16.6771 | -5216298.0571 | -4.1487 | -61.1272 | -5306349.8667 | 4.1969 | 0.0 |
| 0.2291 | 8.8889 | 400 | 0.2548 | 0.0426 | -16.7332 | -5214656.9143 | -4.2796 | -62.4359 | -5268033.4222 | 4.3222 | 0.0 |
| 0.1677 | 10.0 | 450 | 0.2541 | 0.0309 | -16.8498 | -5221032.2286 | -4.3105 | -62.7444 | -5284203.3778 | 4.3414 | 0.0 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for chchen/Llama-3.1-8B-Instruct-KTO-400
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct