Felladrin
/

TinyMistral-248M-Chat-v2

@@ -5,6 +5,18 @@ license: apache-2.0
 datasets:
   - HuggingFaceH4/ultrachat_200k
   - Felladrin/ChatML-ultrachat_200k
 base_model: Locutusque/TinyMistral-248M
 pipeline_tag: text-generation
 widget:
@@ -45,20 +57,22 @@ widget:
 inference:
   parameters:
     max_new_tokens: 250
-    penalty_alpha: 0.45
-    top_k: 4
-    repetition_penalty: 1.03
-    guidance_scale: 1.3
 ---
-# Locutusque's TinyMistral-248M trained on UltraChat dataset
 - Base model: [Locutusque/TinyMistral-248M](https://huggingface.co/Locutusque/TinyMistral-248M) with two additional special tokens (`<|im_start|>` and `<|im_end|>`)
-- Dataset: [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-ultrachat_200k)] [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
-- License: [Apache License 2.0](https://huggingface.co/Felladrin/TinyMistral-248M-Chat-v1/resolve/main/license.txt)
-- Availability in other ML formats:
-  - ONNX: [Felladrin/onnx-TinyMistral-248M-Chat-v1](https://huggingface.co/Felladrin/onnx-TinyMistral-248M-Chat-v1)
-  - GGUF: [Felladrin/gguf-TinyMistral-248M-Chat-v1](https://huggingface.co/Felladrin/gguf-TinyMistral-248M-Chat-v1)
 ## Recommended Prompt Format
@@ -73,10 +87,8 @@ inference:
 ## Recommended Inference Parameters
 ```yml
-penalty_alpha: 0.45
-top_k: 4
-repetition_penalty: 1.03
-guidance_scale: 1.3
 ```
 ## Usage Example
@@ -84,7 +96,7 @@ guidance_scale: 1.3
 ```python
 from transformers import pipeline
-generate = pipeline("text-generation", "Felladrin/TinyMistral-248M-Chat-v1")
 messages = [
     {
@@ -110,10 +122,8 @@ prompt = generate.tokenizer.apply_chat_template(messages, tokenize=False, add_ge
 output = generate(
     prompt,
     max_new_tokens=256,
-    penalty_alpha=0.45,
-    top_k=4,
-    repetition_penalty=1.03,
-    guidance_scale=1.3,
 )
 print(output[0]["generated_text"])
@@ -126,10 +136,11 @@ This model was trained with [SFTTrainer](https://huggingface.co/docs/trl/main/en
 | Hyperparameter         | Value                                         |
 | :--------------------- | :-------------------------------------------- |
 | Learning rate          | 2e-5                                          |
-| Total train batch size | 16                                            |
 | Max. sequence length   | 2048                                          |
-| Weight decay           | 0                                             |
 | Warmup ratio           | 0.1                                           |
 | Optimizer              | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
 | Scheduler              | cosine                                        |
 | Seed                   | 42                                            |

 datasets:
   - HuggingFaceH4/ultrachat_200k
   - Felladrin/ChatML-ultrachat_200k
+  - Open-Orca/OpenOrca
+  - Felladrin/ChatML-OpenOrca
+  - hkust-nlp/deita-10k-v0
+  - Felladrin/ChatML-deita-10k-v0
+  - LDJnr/Capybara
+  - Felladrin/ChatML-Capybara
+  - databricks/databricks-dolly-15k
+  - Felladrin/ChatML-databricks-dolly-15k
+  - euclaise/reddit-instruct-curated
+  - Felladrin/ChatML-reddit-instruct-curated
+  - CohereForAI/aya_dataset
+  - Felladrin/ChatML-aya_dataset
 base_model: Locutusque/TinyMistral-248M
 pipeline_tag: text-generation
 widget:
 inference:
   parameters:
     max_new_tokens: 250
+    penalty_alpha: 0.5
+    top_k: 5
 ---
+# Locutusque's TinyMistral-248M trained on chat datasets
 - Base model: [Locutusque/TinyMistral-248M](https://huggingface.co/Locutusque/TinyMistral-248M) with two additional special tokens (`<|im_start|>` and `<|im_end|>`)
+- Datasets:
+  - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-ultrachat_200k)] [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
+  - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-OpenOrca)] [Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca)
+  - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-deita-10k-v0)] [hkust-nlp/deita-10k-v0](https://huggingface.co/datasets/hkust-nlp/deita-10k-v0)
+  - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-Capybara)] [LDJnr/Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
+  - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-databricks-dolly-15k)] [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
+  - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-reddit-instruct-curated)] [euclaise/reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated)
+  - [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-aya_dataset)] [CohereForAI/aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset)
+- License: [Apache License 2.0](https://huggingface.co/Felladrin/TinyMistral-248M-Chat-v2/resolve/main/license.txt)
 ## Recommended Prompt Format
 ## Recommended Inference Parameters
 ```yml
+penalty_alpha: 0.5
+top_k: 5
 ```
 ## Usage Example
 ```python
 from transformers import pipeline
+generate = pipeline("text-generation", "Felladrin/TinyMistral-248M-Chat-v2")
 messages = [
     {
 output = generate(
     prompt,
     max_new_tokens=256,
+    penalty_alpha=0.5,
+    top_k=5,
 )
 print(output[0]["generated_text"])
 | Hyperparameter         | Value                                         |
 | :--------------------- | :-------------------------------------------- |
 | Learning rate          | 2e-5                                          |
+| Total train batch size | 32                                            |
 | Max. sequence length   | 2048                                          |
+| Weight decay           | 0.01                                          |
 | Warmup ratio           | 0.1                                           |
+| NEFTune Noise Alpha    | 5                                             |
 | Optimizer              | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
 | Scheduler              | cosine                                        |
 | Seed                   | 42                                            |

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6dcc360a5b94a63142091dba60d272644ea30b0af2a0f0b0192691b1d2f4417a
 size 992108712

 version https://git-lfs.github.com/spec/v1
+oid sha256:52178bd78ce2e9eaff3fba98236b261d0c97c5423b6eb1dee8d6d3abe1a37850
 size 992108712