gpt1B_DPO_model

This model is a fine-tuned version of AI-Sweden-Models/gpt-sw3-1.3b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0123
  • Rewards/chosen: 0.0352
  • Rewards/rejected: -5.6889
  • Rewards/accuracies: 1.0
  • Rewards/margins: 5.7242
  • Logps/rejected: -278.6341
  • Logps/chosen: -126.7145
  • Logits/rejected: -2.7863
  • Logits/chosen: -2.9985

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.2383 0.2 50 0.2344 0.1296 -1.3092 0.9967 1.4389 -234.8370 -125.7705 -3.0903 -3.2537
0.0573 0.4 100 0.0615 0.1058 -3.2004 0.9967 3.3063 -253.7490 -126.0084 -2.9086 -3.0985
0.0262 0.6 150 0.0291 -0.0050 -4.5248 0.9967 4.5198 -266.9924 -127.1163 -2.8221 -3.0267
0.0191 0.79 200 0.0205 0.0107 -4.9990 0.9967 5.0096 -271.7344 -126.9600 -2.8042 -3.0131
0.0106 0.99 250 0.0171 -0.0051 -5.3187 0.9967 5.3135 -274.9313 -127.1180 -2.7884 -3.0001
0.0129 1.19 300 0.0148 0.0024 -5.4879 1.0 5.4902 -276.6234 -127.0432 -2.7840 -2.9962
0.0125 1.39 350 0.0137 0.0243 -5.5389 1.0 5.5632 -277.1337 -126.8233 -2.7873 -2.9994
0.0079 1.59 400 0.0129 0.0313 -5.5885 1.0 5.6198 -277.6297 -126.7539 -2.7878 -3.0000
0.0077 1.79 450 0.0126 0.0332 -5.6246 1.0 5.6578 -277.9906 -126.7342 -2.7878 -2.9998
0.0073 1.99 500 0.0126 0.0322 -5.6582 1.0 5.6905 -278.3270 -126.7444 -2.7863 -2.9985
0.0087 2.19 550 0.0123 0.0334 -5.6819 1.0 5.7153 -278.5634 -126.7327 -2.7862 -2.9983
0.0111 2.38 600 0.0123 0.0324 -5.6898 1.0 5.7222 -278.6425 -126.7427 -2.7862 -2.9984
0.0086 2.58 650 0.0122 0.0357 -5.6877 1.0 5.7234 -278.6218 -126.7101 -2.7863 -2.9984
0.0067 2.78 700 0.0122 0.0352 -5.6897 1.0 5.7249 -278.6414 -126.7143 -2.7860 -2.9981
0.0067 2.98 750 0.0123 0.0352 -5.6889 1.0 5.7242 -278.6341 -126.7145 -2.7863 -2.9985

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
43
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for thorirhrafn/gpt1B_DPO_model

Adapter
(15)
this model