Karzan
/

gpt2-walamakan-2

@@ -1,6 +1,6 @@
 ---
 license: apache-2.0
-base_model: Karzan/gpt2-walamakan
 tags:
 - generated_from_trainer
 model-index:
@@ -13,9 +13,9 @@ should probably proofread and complete it, then remove this comment. -->
 # gpt2-walamakan-2
-This model is a fine-tuned version of [Karzan/gpt2-walamakan](https://huggingface.co/Karzan/gpt2-walamakan) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 6.7392
 ## Model description
@@ -35,11 +35,11 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
-- train_batch_size: 8
-- eval_batch_size: 8
 - seed: 42
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 30
@@ -48,41 +48,41 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 0.523         | 1.0   | 94   | 6.4968          |
-| 0.5212        | 2.0   | 188  | 6.4501          |
-| 0.488         | 3.0   | 282  | 6.4814          |
-| 0.4723        | 4.0   | 376  | 6.5004          |
-| 0.4452        | 5.0   | 470  | 6.5328          |
-| 0.4442        | 6.0   | 564  | 6.5507          |
-| 0.4147        | 7.0   | 658  | 6.5598          |
-| 0.397         | 8.0   | 752  | 6.5623          |
-| 0.3868        | 9.0   | 846  | 6.5642          |
-| 0.3686        | 10.0  | 940  | 6.5713          |
-| 0.3553        | 11.0  | 1034 | 6.6027          |
-| 0.338         | 12.0  | 1128 | 6.5953          |
-| 0.3344        | 13.0  | 1222 | 6.6386          |
-| 0.315         | 14.0  | 1316 | 6.6202          |
-| 0.3096        | 15.0  | 1410 | 6.6239          |
-| 0.2961        | 16.0  | 1504 | 6.6648          |
-| 0.2899        | 17.0  | 1598 | 6.6663          |
-| 0.2782        | 18.0  | 1692 | 6.6750          |
-| 0.2642        | 19.0  | 1786 | 6.6777          |
-| 0.2541        | 20.0  | 1880 | 6.6807          |
-| 0.2502        | 21.0  | 1974 | 6.6956          |
-| 0.2453        | 22.0  | 2068 | 6.7099          |
-| 0.2485        | 23.0  | 2162 | 6.7159          |
-| 0.2342        | 24.0  | 2256 | 6.7149          |
-| 0.2226        | 25.0  | 2350 | 6.7288          |
-| 0.22          | 26.0  | 2444 | 6.7302          |
-| 0.2172        | 27.0  | 2538 | 6.7315          |
-| 0.2185        | 28.0  | 2632 | 6.7374          |
-| 0.2131        | 29.0  | 2726 | 6.7361          |
-| 0.2089        | 30.0  | 2820 | 6.7392          |
 ### Framework versions
 - Transformers 4.32.0
-- Pytorch 2.0.1+cu118
 - Datasets 2.14.4
 - Tokenizers 0.13.3

 ---
 license: apache-2.0
+base_model: Karzan/gpt2-walamakan-2
 tags:
 - generated_from_trainer
 model-index:
 # gpt2-walamakan-2
+This model is a fine-tuned version of [Karzan/gpt2-walamakan-2](https://huggingface.co/Karzan/gpt2-walamakan-2) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 6.9220
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
+- train_batch_size: 16
+- eval_batch_size: 16
 - seed: 42
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 30
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 0.2697        | 1.0   | 47   | 6.7283          |
+| 0.2562        | 2.0   | 94   | 6.7642          |
+| 0.2491        | 3.0   | 141  | 6.7544          |
+| 0.2403        | 4.0   | 188  | 6.7617          |
+| 0.2332        | 5.0   | 235  | 6.7501          |
+| 0.2287        | 6.0   | 282  | 6.7719          |
+| 0.2178        | 7.0   | 329  | 6.7966          |
+| 0.2111        | 8.0   | 376  | 6.8080          |
+| 0.2051        | 9.0   | 423  | 6.8298          |
+| 0.1984        | 10.0  | 470  | 6.8288          |
+| 0.1933        | 11.0  | 517  | 6.8321          |
+| 0.1896        | 12.0  | 564  | 6.8422          |
+| 0.1829        | 13.0  | 611  | 6.8685          |
+| 0.1762        | 14.0  | 658  | 6.8504          |
+| 0.1757        | 15.0  | 705  | 6.8636          |
+| 0.1695        | 16.0  | 752  | 6.8704          |
+| 0.165         | 17.0  | 799  | 6.8803          |
+| 0.1617        | 18.0  | 846  | 6.8826          |
+| 0.159         | 19.0  | 893  | 6.8774          |
+| 0.1557        | 20.0  | 940  | 6.8872          |
+| 0.152         | 21.0  | 987  | 6.8998          |
+| 0.1473        | 22.0  | 1034 | 6.8998          |
+| 0.1455        | 23.0  | 1081 | 6.9136          |
+| 0.1425        | 24.0  | 1128 | 6.9149          |
+| 0.1392        | 25.0  | 1175 | 6.9105          |
+| 0.1395        | 26.0  | 1222 | 6.9167          |
+| 0.136         | 27.0  | 1269 | 6.9145          |
+| 0.1355        | 28.0  | 1316 | 6.9185          |
+| 0.1335        | 29.0  | 1363 | 6.9192          |
+| 0.1322        | 30.0  | 1410 | 6.9220          |
 ### Framework versions
 - Transformers 4.32.0
+- Pytorch 2.1.0.dev20230605+cu121
 - Datasets 2.14.4
 - Tokenizers 0.13.3

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "Karzan/gpt2-walamakan",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"

 {
+  "_name_or_path": "Karzan/gpt2-walamakan-2",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c76b8657b6bd9e26441e101f77c6bc6dde93b4db63355e19d6256ada59c39537
-size 854378685

 version https://git-lfs.github.com/spec/v1
+oid sha256:797010b0cad8c4e1d13d37e8d571d46451965da2b8a6061561052d8b8092e4da
+size 854379130

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8f49e5cda68555b21fa025a8c78120db88623fcf293cb3d669e4e5093398c7db
-size 4027

 version https://git-lfs.github.com/spec/v1
+oid sha256:bc147e07f8fdde37b903b4976ca1ee40f9d2e4008c92304b184ca10e340d5ff2
+size 4472