Model save

Browse files

Files changed (9) hide show

README.md +27 -8
all_results.json +4 -4
config.json +1 -1
model-00001-of-00003.safetensors +1 -1
model-00002-of-00003.safetensors +1 -1
model-00003-of-00003.safetensors +1 -1
train_results.json +4 -4
trainer_state.json +0 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -2,15 +2,10 @@
 license: apache-2.0
 base_model: mistralai/Mistral-7B-v0.1
 tags:
-- alignment-handbook
-- trl
-- orpo
-- generated_from_trainer
 - trl
 - orpo
 - generated_from_trainer
-datasets:
-- HuggingFaceH4/ultrafeedback_binarized
 model-index:
 - name: zephyr-7b-sft-full-orpo
   results: []
@@ -19,10 +14,23 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/statking/huggingface/runs/lw7rbi20)
 # zephyr-7b-sft-full-orpo
-This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 ## Model description
@@ -57,6 +65,17 @@ The following hyperparameters were used during training:
 ### Training results
 ### Framework versions

 license: apache-2.0
 base_model: mistralai/Mistral-7B-v0.1
 tags:
 - trl
 - orpo
+- alignment-handbook
 - generated_from_trainer
 model-index:
 - name: zephyr-7b-sft-full-orpo
   results: []
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/statking/huggingface/runs/ehjj41t1)
 # zephyr-7b-sft-full-orpo
+This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.5088
+- Rewards/chosen: -0.0404
+- Rewards/rejected: -0.0510
+- Rewards/accuracies: 0.6290
+- Rewards/margins: 0.0106
+- Logps/rejected: -1.0202
+- Logps/chosen: -0.8085
+- Logits/rejected: -2.5337
+- Logits/chosen: -2.5634
+- Nll Loss: 0.4741
+- Log Odds Ratio: -0.6379
+- Log Odds Chosen: 0.3305
 ## Model description
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
+| 0.5707        | 0.1049 | 100  | 1.1268          | -0.0452        | -0.0539          | 0.6369             | 0.0086          | -1.0774        | -0.9045      | -2.5432         | -2.5811       | 1.0893   | -0.6413        | 0.2601          |
+| 0.5663        | 0.2098 | 200  | 0.5741          | -0.0440        | -0.0534          | 0.6270             | 0.0094          | -1.0676        | -0.8799      | -2.5377         | -2.5597       | 0.5352   | -0.6447        | 0.2863          |
+| 0.5817        | 0.3146 | 300  | 0.5572          | -0.0440        | -0.0531          | 0.6190             | 0.0091          | -1.0628        | -0.8808      | -2.4499         | -2.4818       | 0.5207   | -0.6503        | 0.2780          |
+| 0.5724        | 0.4195 | 400  | 0.5416          | -0.0426        | -0.0515          | 0.625              | 0.0089          | -1.0293        | -0.8510      | -2.4026         | -2.4376       | 0.5060   | -0.6551        | 0.2819          |
+| 0.5486        | 0.5244 | 500  | 0.5344          | -0.0425        | -0.0526          | 0.6151             | 0.0101          | -1.0514        | -0.8492      | -2.4373         | -2.4718       | 0.4990   | -0.6439        | 0.3193          |
+| 0.5156        | 0.6293 | 600  | 0.5242          | -0.0417        | -0.0514          | 0.6151             | 0.0098          | -1.0285        | -0.8333      | -2.5551         | -2.5811       | 0.4882   | -0.6470        | 0.3056          |
+| 0.5297        | 0.7341 | 700  | 0.5191          | -0.0411        | -0.0521          | 0.6310             | 0.0110          | -1.0422        | -0.8215      | -2.4477         | -2.4801       | 0.4838   | -0.6351        | 0.3407          |
+| 0.5184        | 0.8390 | 800  | 0.5138          | -0.0409        | -0.0532          | 0.6310             | 0.0123          | -1.0647        | -0.8179      | -2.4575         | -2.4922       | 0.4796   | -0.6304        | 0.3783          |
+| 0.5235        | 0.9439 | 900  | 0.5088          | -0.0404        | -0.0510          | 0.6290             | 0.0106          | -1.0202        | -0.8085      | -2.5337         | -2.5634       | 0.4741   | -0.6379        | 0.3305          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 0.9994756161510225,
     "total_flos": 0.0,
-    "train_loss": 0.56347276506494,
-    "train_runtime": 19079.6454,
     "train_samples": 61005,
-    "train_samples_per_second": 3.197,
-    "train_steps_per_second": 0.05
 }

 {
     "epoch": 0.9994756161510225,
     "total_flos": 0.0,
+    "train_loss": 0.5642813587989287,
+    "train_runtime": 20357.789,
     "train_samples": 61005,
+    "train_samples_per_second": 2.997,
+    "train_steps_per_second": 0.047
 }

config.json CHANGED Viewed

@@ -21,6 +21,6 @@
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.41.0.dev0",
-  "use_cache": true,
   "vocab_size": 32000
 }

   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
   "transformers_version": "4.41.0.dev0",
+  "use_cache": false,
   "vocab_size": 32000
 }

model-00001-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f8de661845ca4af8b34b8ce2a45722bcc4b2834c916890a9672a109c08517fd6
 size 4943162336

 version https://git-lfs.github.com/spec/v1
+oid sha256:ffa90a9394bde99c724e16cce3299e05551da2f9ecf20baf80148357c1179174
 size 4943162336

model-00002-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:41d93ae590bd8fedc2ce9c0d3406d28032086aa93efb3b8bd7351d4fc1684cda
 size 4999819336

 version https://git-lfs.github.com/spec/v1
+oid sha256:2c33b2f762f44768cfc2ace8d53425584001dbfa592278381393c8b7d1b7d12c
 size 4999819336

model-00003-of-00003.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:327daa62025779f1d2ac5100edbf25a4b7774225203e30efa878921815a519f4
 size 4540516344

 version https://git-lfs.github.com/spec/v1
+oid sha256:1e5dbbc97280ec6cbd9b6796a8e3a7b271c46bd33b35bcfececd6a5d5d26303f
 size 4540516344

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 0.9994756161510225,
     "total_flos": 0.0,
-    "train_loss": 0.56347276506494,
-    "train_runtime": 19079.6454,
     "train_samples": 61005,
-    "train_samples_per_second": 3.197,
-    "train_steps_per_second": 0.05
 }

 {
     "epoch": 0.9994756161510225,
     "total_flos": 0.0,
+    "train_loss": 0.5642813587989287,
+    "train_runtime": 20357.789,
     "train_samples": 61005,
+    "train_samples_per_second": 2.997,
+    "train_steps_per_second": 0.047
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0bf035813fd07976aa0220cc580abeb9837a63f23c3331acf22c16aa3d9e2647
 size 6648

 version https://git-lfs.github.com/spec/v1
+oid sha256:bcb6aaae370ec05ab890f4838b2b43e949b0e3b36b5e24c16aff75cb84e3a43c
 size 6648