End of training

Files changed (6) hide show

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [ai-forever/rugpt3medium_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3medium_based_on_gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 7.2471
 ## Model description
@@ -33,10 +33,12 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 2e-05
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 500
@@ -47,10 +49,14 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 9.9269        | 0.61  | 100  | 9.4224          |
-| 9.1428        | 1.23  | 200  | 8.6591          |
-| 8.3563        | 1.84  | 300  | 7.8627          |
-| 7.665         | 2.45  | 400  | 7.2471          |
 ### Framework versions

 This model is a fine-tuned version of [ai-forever/rugpt3medium_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3medium_based_on_gpt2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 9.1801
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 1e-05
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 3
+- total_train_batch_size: 24
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 500
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 10.9006       | 0.37  | 20   | 10.5654         |
+| 10.3052       | 0.74  | 40   | 9.9389          |
+| 9.8311        | 1.1   | 60   | 9.6573          |
+| 9.6329        | 1.47  | 80   | 9.5473          |
+| 9.5378        | 1.84  | 100  | 9.4772          |
+| 9.466         | 2.21  | 120  | 9.3990          |
+| 9.3906        | 2.58  | 140  | 9.3004          |
+| 9.2874        | 2.94  | 160  | 9.1801          |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b45eec3b0b346287e8479f39ec066c1468c3bfc0881c9d6abc2392e57c5884e6
 size 1423517184

 version https://git-lfs.github.com/spec/v1
+oid sha256:f7d62f94326ceb501dda7949b4a31538b520a7913ee9a0ccedbf653599caf34d
 size 1423517184

runs/Dec27_19-27-56_501707aa838d/events.out.tfevents.1703705281.501707aa838d.221.2 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:0179444c435b2116201ec1cbf1a4b6d9f3e7e76a480c67a1f3bee0bb57a19c37
+size 6260

runs/Dec27_19-33-20_501707aa838d/events.out.tfevents.1703705604.501707aa838d.221.3 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:facac93b62b683d2bde86eed802b5e55813abd95231d1665873eea12f1e85134
+size 8283

tokenizer.json CHANGED Viewed

@@ -4,9 +4,18 @@
     "direction": "Left",
     "max_length": 64,
     "strategy": "LongestFirst",
-    "stride": 0
   },
-  "padding": null,
   "added_tokens": [
     {
       "id": 0,

     "direction": "Left",
     "max_length": 64,
     "strategy": "LongestFirst",
+    "stride": 10
+  },
+  "padding": {
+    "strategy": {
+      "Fixed": 64
+    },
+    "direction": "Left",
+    "pad_to_multiple_of": null,
+    "pad_id": 0,
+    "pad_type_id": 0,
+    "pad_token": "<pad>"
   },
   "added_tokens": [
     {
       "id": 0,

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5418cec9f0ff5087a801ad01a7eb730e7557c41c53854eb4c9372583495eae75
 size 4600

 version https://git-lfs.github.com/spec/v1
+oid sha256:4468c13f6af905f827a766b673966a99fc6c77c374251ba294b7bf5c8a0ec57c
 size 4600