tsavage68 commited on
Commit
7c0a2eb
1 Parent(s): 769c518

End of training

Browse files
README.md CHANGED
@@ -19,14 +19,14 @@ should probably proofread and complete it, then remove this comment. -->
19
  This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.0000
22
- - Rewards/chosen: 2.0056
23
- - Rewards/rejected: -8.7799
24
  - Rewards/accuracies: 1.0
25
- - Rewards/margins: 10.7855
26
- - Logps/rejected: -167.7224
27
- - Logps/chosen: -28.0760
28
- - Logits/rejected: -2.4173
29
- - Logits/chosen: -2.4429
30
 
31
  ## Model description
32
 
@@ -45,7 +45,7 @@ More information needed
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
- - learning_rate: 1e-07
49
  - train_batch_size: 2
50
  - eval_batch_size: 1
51
  - seed: 42
@@ -60,10 +60,10 @@ The following hyperparameters were used during training:
60
 
61
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
- | 0.0514 | 0.2667 | 50 | 0.0095 | 1.0907 | -3.6125 | 1.0 | 4.7032 | -116.0486 | -37.2253 | -2.5048 | -2.5196 |
64
- | 0.0 | 0.5333 | 100 | 0.0000 | 1.9516 | -8.2814 | 1.0 | 10.2330 | -162.7370 | -28.6162 | -2.4273 | -2.4516 |
65
- | 0.0 | 0.8 | 150 | 0.0000 | 2.0033 | -8.7623 | 1.0 | 10.7656 | -167.5460 | -28.0989 | -2.4174 | -2.4431 |
66
- | 0.0 | 1.0667 | 200 | 0.0000 | 2.0056 | -8.7799 | 1.0 | 10.7855 | -167.7224 | -28.0760 | -2.4173 | -2.4429 |
67
 
68
 
69
  ### Framework versions
 
19
  This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.0000
22
+ - Rewards/chosen: 2.3496
23
+ - Rewards/rejected: -12.1508
24
  - Rewards/accuracies: 1.0
25
+ - Rewards/margins: 14.5003
26
+ - Logps/rejected: -201.4312
27
+ - Logps/chosen: -24.6368
28
+ - Logits/rejected: -2.3632
29
+ - Logits/chosen: -2.3967
30
 
31
  ## Model description
32
 
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
+ - learning_rate: 1e-06
49
  - train_batch_size: 2
50
  - eval_batch_size: 1
51
  - seed: 42
 
60
 
61
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.0 | 0.2667 | 50 | 0.0000 | 2.1916 | -10.0354 | 1.0 | 12.2270 | -180.2773 | -26.2167 | -2.3982 | -2.4261 |
64
+ | 0.0 | 0.5333 | 100 | 0.0000 | 2.3125 | -11.6667 | 1.0 | 13.9792 | -196.5901 | -25.0075 | -2.3692 | -2.4015 |
65
+ | 0.0 | 0.8 | 150 | 0.0000 | 2.3477 | -12.1123 | 1.0 | 14.4600 | -201.0466 | -24.6557 | -2.3646 | -2.3980 |
66
+ | 0.0 | 1.0667 | 200 | 0.0000 | 2.3496 | -12.1508 | 1.0 | 14.5003 | -201.4312 | -24.6368 | -2.3632 | -2.3967 |
67
 
68
 
69
  ### Framework versions
final_checkpoint/model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6f92c88bc055e9a9e87bdf267c8907dd03cffb7d7df9d7c8154b57bcfc997009
3
  size 4943162240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eddd345ca8383dcad7bbd25be4a05f6a2a92e4df50220570736ef55964e2b98e
3
  size 4943162240
final_checkpoint/model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f4b44a1ab9627a838a3dde87f53f5aedf12d7259bb590db75f234ff17f6ce832
3
  size 4999819232
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8e7a6f6f93bd0dab3f1c3758b98baba1d9d5a8979f0264b023a847441b07a10
3
  size 4999819232
final_checkpoint/model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c3201b98d1f5d8d69d8773d68b9770153410d49fa11e7e89c593620f0420e3f9
3
  size 4540516256
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd22d0bad40082bf2b7ac1e3c7edc7eba177c46c9bc4541624f7283286ee7cba
3
  size 4540516256
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6f92c88bc055e9a9e87bdf267c8907dd03cffb7d7df9d7c8154b57bcfc997009
3
  size 4943162240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eddd345ca8383dcad7bbd25be4a05f6a2a92e4df50220570736ef55964e2b98e
3
  size 4943162240
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f4b44a1ab9627a838a3dde87f53f5aedf12d7259bb590db75f234ff17f6ce832
3
  size 4999819232
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8e7a6f6f93bd0dab3f1c3758b98baba1d9d5a8979f0264b023a847441b07a10
3
  size 4999819232
model-00003-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c3201b98d1f5d8d69d8773d68b9770153410d49fa11e7e89c593620f0420e3f9
3
  size 4540516256
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd22d0bad40082bf2b7ac1e3c7edc7eba177c46c9bc4541624f7283286ee7cba
3
  size 4540516256
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:99844999e345fc0dd9d6d0e82f27295ca9e5f16bd5d88e676864da5463c5e62f
3
  size 5176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0a6eeabdd97e0c2e7531244accd85e99c139eceb39347cffc8fc55b792f3f13
3
  size 5176