AndersNielsen/shawgpt-ft-improved

Browse files

Files changed (4) hide show

README.md +24 -15
adapter_config.json +1 -1
adapter_model.safetensors +2 -2
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.7362
 ## Model description
@@ -35,7 +35,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0002
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
@@ -44,23 +44,32 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
-- num_epochs: 10
 - mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 4.5901        | 0.9231 | 3    | 3.9541          |
-| 4.0291        | 1.8462 | 6    | 3.4199          |
-| 3.445         | 2.7692 | 9    | 2.9590          |
-| 2.2321        | 4.0    | 13   | 2.5259          |
-| 2.6077        | 4.9231 | 16   | 2.2507          |
-| 2.2579        | 5.8462 | 19   | 2.0273          |
-| 1.9983        | 6.7692 | 22   | 1.8818          |
-| 1.4141        | 8.0    | 26   | 1.7787          |
-| 1.8067        | 8.9231 | 29   | 1.7407          |
-| 1.2606        | 9.2308 | 30   | 1.7362          |
 ### Framework versions

 This model is a fine-tuned version of [TheBloke/Mistral-7B-Instruct-v0.2-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GPTQ) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.4738
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 2
+- num_epochs: 20
 - mixed_precision_training: Native AMP
 ### Training results
+| Training Loss | Epoch   | Step | Validation Loss |
+|:-------------:|:-------:|:----:|:---------------:|
+| 4.6196        | 0.9231  | 3    | 4.1001          |
+| 4.3204        | 1.8462  | 6    | 3.8081          |
+| 3.9701        | 2.7692  | 9    | 3.5253          |
+| 2.721         | 4.0     | 13   | 3.1570          |
+| 3.329         | 4.9231  | 16   | 2.9058          |
+| 3.0229        | 5.8462  | 19   | 2.6830          |
+| 2.7687        | 6.7692  | 22   | 2.4873          |
+| 1.9015        | 8.0     | 26   | 2.2572          |
+| 2.3231        | 8.9231  | 29   | 2.0842          |
+| 2.0802        | 9.8462  | 32   | 1.9251          |
+| 1.9463        | 10.7692 | 35   | 1.8215          |
+| 1.3485        | 12.0    | 39   | 1.7140          |
+| 1.7274        | 12.9231 | 42   | 1.6481          |
+| 1.6266        | 13.8462 | 45   | 1.5914          |
+| 1.579         | 14.7692 | 48   | 1.5464          |
+| 1.1539        | 16.0    | 52   | 1.5070          |
+| 1.488         | 16.9231 | 55   | 1.4877          |
+| 1.4566        | 17.8462 | 58   | 1.4766          |
+| 1.0211        | 18.4615 | 60   | 1.4738          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -16,7 +16,7 @@
   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
-  "r": 8,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [

   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
+  "r": 16,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4e867398c5ea09f051ff29d94f661a331861e9a407688a617fa693fda5e0a146
-size 8397056

 version https://git-lfs.github.com/spec/v1
+oid sha256:bbde9b24b835f46a892db789c75c5c83d4a98d8dc9fa24d66ea919c238d3cf2f
+size 16785792

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:776ddd285099bcab0efd5a67882a932086a9d41ba3acaa5482ec884a5d43688e
 size 5176

 version https://git-lfs.github.com/spec/v1
+oid sha256:5a9553599575bc336a3b3f2569d76fbf9e91de9124fdc59b75dd32a736dc1ac4
 size 5176