gpt1B_DPO_model
This model is a fine-tuned version of AI-Sweden-Models/gpt-sw3-1.3b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0123
- Rewards/chosen: 0.0352
- Rewards/rejected: -5.6889
- Rewards/accuracies: 1.0
- Rewards/margins: 5.7242
- Logps/rejected: -278.6341
- Logps/chosen: -126.7145
- Logits/rejected: -2.7863
- Logits/chosen: -2.9985
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.2383 | 0.2 | 50 | 0.2344 | 0.1296 | -1.3092 | 0.9967 | 1.4389 | -234.8370 | -125.7705 | -3.0903 | -3.2537 |
0.0573 | 0.4 | 100 | 0.0615 | 0.1058 | -3.2004 | 0.9967 | 3.3063 | -253.7490 | -126.0084 | -2.9086 | -3.0985 |
0.0262 | 0.6 | 150 | 0.0291 | -0.0050 | -4.5248 | 0.9967 | 4.5198 | -266.9924 | -127.1163 | -2.8221 | -3.0267 |
0.0191 | 0.79 | 200 | 0.0205 | 0.0107 | -4.9990 | 0.9967 | 5.0096 | -271.7344 | -126.9600 | -2.8042 | -3.0131 |
0.0106 | 0.99 | 250 | 0.0171 | -0.0051 | -5.3187 | 0.9967 | 5.3135 | -274.9313 | -127.1180 | -2.7884 | -3.0001 |
0.0129 | 1.19 | 300 | 0.0148 | 0.0024 | -5.4879 | 1.0 | 5.4902 | -276.6234 | -127.0432 | -2.7840 | -2.9962 |
0.0125 | 1.39 | 350 | 0.0137 | 0.0243 | -5.5389 | 1.0 | 5.5632 | -277.1337 | -126.8233 | -2.7873 | -2.9994 |
0.0079 | 1.59 | 400 | 0.0129 | 0.0313 | -5.5885 | 1.0 | 5.6198 | -277.6297 | -126.7539 | -2.7878 | -3.0000 |
0.0077 | 1.79 | 450 | 0.0126 | 0.0332 | -5.6246 | 1.0 | 5.6578 | -277.9906 | -126.7342 | -2.7878 | -2.9998 |
0.0073 | 1.99 | 500 | 0.0126 | 0.0322 | -5.6582 | 1.0 | 5.6905 | -278.3270 | -126.7444 | -2.7863 | -2.9985 |
0.0087 | 2.19 | 550 | 0.0123 | 0.0334 | -5.6819 | 1.0 | 5.7153 | -278.5634 | -126.7327 | -2.7862 | -2.9983 |
0.0111 | 2.38 | 600 | 0.0123 | 0.0324 | -5.6898 | 1.0 | 5.7222 | -278.6425 | -126.7427 | -2.7862 | -2.9984 |
0.0086 | 2.58 | 650 | 0.0122 | 0.0357 | -5.6877 | 1.0 | 5.7234 | -278.6218 | -126.7101 | -2.7863 | -2.9984 |
0.0067 | 2.78 | 700 | 0.0122 | 0.0352 | -5.6897 | 1.0 | 5.7249 | -278.6414 | -126.7143 | -2.7860 | -2.9981 |
0.0067 | 2.98 | 750 | 0.0123 | 0.0352 | -5.6889 | 1.0 | 5.7242 | -278.6341 | -126.7145 | -2.7863 | -2.9985 |
Framework versions
- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2
- Downloads last month
- 43
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.
Model tree for thorirhrafn/gpt1B_DPO_model
Base model
AI-Sweden-Models/gpt-sw3-1.3b