End of training

Browse files

Files changed (8) hide show

README.md +12 -12
final_checkpoint/model-00001-of-00003.safetensors +1 -1
final_checkpoint/model-00002-of-00003.safetensors +1 -1
final_checkpoint/model-00003-of-00003.safetensors +1 -1
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -19,14 +19,14 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0000
-- Rewards/chosen: 2.0056
-- Rewards/rejected: -8.7799
 - Rewards/accuracies: 1.0
-- Rewards/margins: 10.7855
-- Logps/rejected: -167.7224
-- Logps/chosen: -28.0760
-- Logits/rejected: -2.4173
-- Logits/chosen: -2.4429
 ## Model description
@@ -45,7 +45,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-07
 - train_batch_size: 2
 - eval_batch_size: 1
 - seed: 42
@@ -60,10 +60,10 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.0514        | 0.2667 | 50   | 0.0095          | 1.0907         | -3.6125          | 1.0                | 4.7032          | -116.0486      | -37.2253     | -2.5048         | -2.5196       |
-| 0.0           | 0.5333 | 100  | 0.0000          | 1.9516         | -8.2814          | 1.0                | 10.2330         | -162.7370      | -28.6162     | -2.4273         | -2.4516       |
-| 0.0           | 0.8    | 150  | 0.0000          | 2.0033         | -8.7623          | 1.0                | 10.7656         | -167.5460      | -28.0989     | -2.4174         | -2.4431       |
-| 0.0           | 1.0667 | 200  | 0.0000          | 2.0056         | -8.7799          | 1.0                | 10.7855         | -167.7224      | -28.0760     | -2.4173         | -2.4429       |
 ### Framework versions

 This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0000
+- Rewards/chosen: 2.3496
+- Rewards/rejected: -12.1508
 - Rewards/accuracies: 1.0
+- Rewards/margins: 14.5003
+- Logps/rejected: -201.4312
+- Logps/chosen: -24.6368
+- Logits/rejected: -2.3632
+- Logits/chosen: -2.3967
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 1e-06
 - train_batch_size: 2
 - eval_batch_size: 1
 - seed: 42
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.0           | 0.2667 | 50   | 0.0000          | 2.1916         | -10.0354         | 1.0                | 12.2270         | -180.2773      | -26.2167     | -2.3982         | -2.4261       |
+| 0.0           | 0.5333 | 100  | 0.0000          | 2.3125         | -11.6667         | 1.0                | 13.9792         | -196.5901      | -25.0075     | -2.3692         | -2.4015       |
+| 0.0           | 0.8    | 150  | 0.0000          | 2.3477         | -12.1123         | 1.0                | 14.4600         | -201.0466      | -24.6557     | -2.3646         | -2.3980       |
+| 0.0           | 1.0667 | 200  | 0.0000          | 2.3496         | -12.1508         | 1.0                | 14.5003         | -201.4312      | -24.6368     | -2.3632         | -2.3967       |
 ### Framework versions

final_checkpoint/model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6f92c88bc055e9a9e87bdf267c8907dd03cffb7d7df9d7c8154b57bcfc997009
 size 4943162240

 version https://git-lfs.github.com/spec/v1
+oid sha256:eddd345ca8383dcad7bbd25be4a05f6a2a92e4df50220570736ef55964e2b98e
 size 4943162240

final_checkpoint/model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f4b44a1ab9627a838a3dde87f53f5aedf12d7259bb590db75f234ff17f6ce832
 size 4999819232

 version https://git-lfs.github.com/spec/v1
+oid sha256:c8e7a6f6f93bd0dab3f1c3758b98baba1d9d5a8979f0264b023a847441b07a10
 size 4999819232

final_checkpoint/model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c3201b98d1f5d8d69d8773d68b9770153410d49fa11e7e89c593620f0420e3f9
 size 4540516256

 version https://git-lfs.github.com/spec/v1
+oid sha256:bd22d0bad40082bf2b7ac1e3c7edc7eba177c46c9bc4541624f7283286ee7cba
 size 4540516256

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6f92c88bc055e9a9e87bdf267c8907dd03cffb7d7df9d7c8154b57bcfc997009
 size 4943162240

 version https://git-lfs.github.com/spec/v1
+oid sha256:eddd345ca8383dcad7bbd25be4a05f6a2a92e4df50220570736ef55964e2b98e
 size 4943162240

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f4b44a1ab9627a838a3dde87f53f5aedf12d7259bb590db75f234ff17f6ce832
 size 4999819232

 version https://git-lfs.github.com/spec/v1
+oid sha256:c8e7a6f6f93bd0dab3f1c3758b98baba1d9d5a8979f0264b023a847441b07a10
 size 4999819232

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c3201b98d1f5d8d69d8773d68b9770153410d49fa11e7e89c593620f0420e3f9
 size 4540516256

 version https://git-lfs.github.com/spec/v1
+oid sha256:bd22d0bad40082bf2b7ac1e3c7edc7eba177c46c9bc4541624f7283286ee7cba
 size 4540516256

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:99844999e345fc0dd9d6d0e82f27295ca9e5f16bd5d88e676864da5463c5e62f
 size 5176

 version https://git-lfs.github.com/spec/v1
+oid sha256:b0a6eeabdd97e0c2e7531244accd85e99c139eceb39347cffc8fc55b792f3f13
 size 5176