Model save

Files changed (8) hide show

README.md CHANGED Viewed

@@ -2,15 +2,11 @@
 license: apache-2.0
 base_model: mistralai/Mistral-7B-v0.1
 tags:
-- alignment-handbook
-- trl
-- sft
-- generated_from_trainer
 - trl
 - sft
 - generated_from_trainer
 datasets:
-- HuggingFaceH4/ultrachat_200k
 model-index:
 - name: zephyr-7b-sft-full
   results: []
@@ -21,9 +17,9 @@ should probably proofread and complete it, then remove this comment. -->
 # zephyr-7b-sft-full
-This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the HuggingFaceH4/ultrachat_200k dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.9491
 ## Model description
@@ -43,13 +39,13 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
-- train_batch_size: 16
-- eval_batch_size: 8
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
-- total_train_batch_size: 64
-- total_eval_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
@@ -59,7 +55,7 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 0.9409        | 1.0   | 2179 | 0.9491          |
 ### Framework versions

 license: apache-2.0
 base_model: mistralai/Mistral-7B-v0.1
 tags:
 - trl
 - sft
 - generated_from_trainer
 datasets:
+- generator
 model-index:
 - name: zephyr-7b-sft-full
   results: []
 # zephyr-7b-sft-full
+This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the generator dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.1126
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 2e-05
+- train_batch_size: 32
+- eval_batch_size: 16
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 4
+- total_train_batch_size: 128
+- total_eval_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 1.0697        | 1.0   | 4358 | 1.1126          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,14 +1,9 @@
 {
     "epoch": 1.0,
-    "eval_loss": 0.9490520358085632,
-    "eval_runtime": 757.9326,
-    "eval_samples": 23109,
-    "eval_samples_per_second": 20.359,
-    "eval_steps_per_second": 0.637,
     "total_flos": 456238269726720.0,
-    "train_loss": 1.0045825210718344,
-    "train_runtime": 27286.1988,
     "train_samples": 207864,
-    "train_samples_per_second": 5.11,
-    "train_steps_per_second": 0.08
 }

 {
     "epoch": 1.0,
     "total_flos": 456238269726720.0,
+    "train_loss": 1.1632422284009862,
+    "train_runtime": 29234.3114,
     "train_samples": 207864,
+    "train_samples_per_second": 19.077,
+    "train_steps_per_second": 0.149
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:52142cc9d392c1aec9d10221e84c064bda396e12e4093aeeb76ef999d2702285
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:abbc6a86071a19b58a3f3f4d371aa81787f020a24b0c81431104013eea616520
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:29336a781a3cc0ec66f6c43b91382418b9210b100a11aa2687674fc1571a5278
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:a2cc2dc988a31beb0a24c7b4b93ddaa529af3355c75533fae481ddfa96674a01
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7c7e1a8f431fa71638a7a3e4c5bf239419d3a0224c01682c15b326b1f78caf76
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:083d153dc313d61ae479d3b1862a3477f5d05e9e9affb6d1d9d2e25573215594
 size 4540516344

runs/May24_19-38-16_ubuntu/events.out.tfevents.1716580221.ubuntu.2195002.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1f4ffecdd676e69047a78e332d778612c7da27eaa8037c97783a9ce4649835ba
-size 186394

 version https://git-lfs.github.com/spec/v1
+oid sha256:e8ab2d791a94c8a3285b2cc7b0e748920cbaedfee83e4b00c23da89f1996a337
+size 189340

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 1.0,
     "total_flos": 456238269726720.0,
-    "train_loss": 1.0045825210718344,
-    "train_runtime": 27286.1988,
     "train_samples": 207864,
-    "train_samples_per_second": 5.11,
-    "train_steps_per_second": 0.08
 }

 {
     "epoch": 1.0,
     "total_flos": 456238269726720.0,
+    "train_loss": 1.1632422284009862,
+    "train_runtime": 29234.3114,
     "train_samples": 207864,
+    "train_samples_per_second": 19.077,
+    "train_steps_per_second": 0.149
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff