End of training
Browse files- README.md +12 -12
- final_checkpoint/model-00001-of-00003.safetensors +1 -1
- final_checkpoint/model-00002-of-00003.safetensors +1 -1
- final_checkpoint/model-00003-of-00003.safetensors +1 -1
- model-00001-of-00003.safetensors +1 -1
- model-00002-of-00003.safetensors +1 -1
- model-00003-of-00003.safetensors +1 -1
- training_args.bin +1 -1
README.md
CHANGED
@@ -19,14 +19,14 @@ should probably proofread and complete it, then remove this comment. -->
|
|
19 |
This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
|
20 |
It achieves the following results on the evaluation set:
|
21 |
- Loss: 0.0000
|
22 |
-
- Rewards/chosen: 2.
|
23 |
-
- Rewards/rejected: -
|
24 |
- Rewards/accuracies: 1.0
|
25 |
-
- Rewards/margins:
|
26 |
-
- Logps/rejected: -
|
27 |
-
- Logps/chosen: -
|
28 |
-
- Logits/rejected: -2.
|
29 |
-
- Logits/chosen: -2.
|
30 |
|
31 |
## Model description
|
32 |
|
@@ -45,7 +45,7 @@ More information needed
|
|
45 |
### Training hyperparameters
|
46 |
|
47 |
The following hyperparameters were used during training:
|
48 |
-
- learning_rate: 1e-
|
49 |
- train_batch_size: 2
|
50 |
- eval_batch_size: 1
|
51 |
- seed: 42
|
@@ -60,10 +60,10 @@ The following hyperparameters were used during training:
|
|
60 |
|
61 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
62 |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
63 |
-
| 0.
|
64 |
-
| 0.0 | 0.5333 | 100 | 0.0000 |
|
65 |
-
| 0.0 | 0.8 | 150 | 0.0000 | 2.
|
66 |
-
| 0.0 | 1.0667 | 200 | 0.0000 | 2.
|
67 |
|
68 |
|
69 |
### Framework versions
|
|
|
19 |
This model is a fine-tuned version of [tsavage68/Na_M2_1000steps_1e7_SFT](https://huggingface.co/tsavage68/Na_M2_1000steps_1e7_SFT) on an unknown dataset.
|
20 |
It achieves the following results on the evaluation set:
|
21 |
- Loss: 0.0000
|
22 |
+
- Rewards/chosen: 2.3496
|
23 |
+
- Rewards/rejected: -12.1508
|
24 |
- Rewards/accuracies: 1.0
|
25 |
+
- Rewards/margins: 14.5003
|
26 |
+
- Logps/rejected: -201.4312
|
27 |
+
- Logps/chosen: -24.6368
|
28 |
+
- Logits/rejected: -2.3632
|
29 |
+
- Logits/chosen: -2.3967
|
30 |
|
31 |
## Model description
|
32 |
|
|
|
45 |
### Training hyperparameters
|
46 |
|
47 |
The following hyperparameters were used during training:
|
48 |
+
- learning_rate: 1e-06
|
49 |
- train_batch_size: 2
|
50 |
- eval_batch_size: 1
|
51 |
- seed: 42
|
|
|
60 |
|
61 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
62 |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
63 |
+
| 0.0 | 0.2667 | 50 | 0.0000 | 2.1916 | -10.0354 | 1.0 | 12.2270 | -180.2773 | -26.2167 | -2.3982 | -2.4261 |
|
64 |
+
| 0.0 | 0.5333 | 100 | 0.0000 | 2.3125 | -11.6667 | 1.0 | 13.9792 | -196.5901 | -25.0075 | -2.3692 | -2.4015 |
|
65 |
+
| 0.0 | 0.8 | 150 | 0.0000 | 2.3477 | -12.1123 | 1.0 | 14.4600 | -201.0466 | -24.6557 | -2.3646 | -2.3980 |
|
66 |
+
| 0.0 | 1.0667 | 200 | 0.0000 | 2.3496 | -12.1508 | 1.0 | 14.5003 | -201.4312 | -24.6368 | -2.3632 | -2.3967 |
|
67 |
|
68 |
|
69 |
### Framework versions
|
final_checkpoint/model-00001-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4943162240
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:eddd345ca8383dcad7bbd25be4a05f6a2a92e4df50220570736ef55964e2b98e
|
3 |
size 4943162240
|
final_checkpoint/model-00002-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4999819232
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c8e7a6f6f93bd0dab3f1c3758b98baba1d9d5a8979f0264b023a847441b07a10
|
3 |
size 4999819232
|
final_checkpoint/model-00003-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4540516256
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:bd22d0bad40082bf2b7ac1e3c7edc7eba177c46c9bc4541624f7283286ee7cba
|
3 |
size 4540516256
|
model-00001-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4943162240
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:eddd345ca8383dcad7bbd25be4a05f6a2a92e4df50220570736ef55964e2b98e
|
3 |
size 4943162240
|
model-00002-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4999819232
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c8e7a6f6f93bd0dab3f1c3758b98baba1d9d5a8979f0264b023a847441b07a10
|
3 |
size 4999819232
|
model-00003-of-00003.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4540516256
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:bd22d0bad40082bf2b7ac1e3c7edc7eba177c46c9bc4541624f7283286ee7cba
|
3 |
size 4540516256
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 5176
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b0a6eeabdd97e0c2e7531244accd85e99c139eceb39347cffc8fc55b792f3f13
|
3 |
size 5176
|