Update model
Browse files- README.md +32 -21
- model.safetensors +1 -1
README.md
CHANGED
@@ -5,6 +5,18 @@ license: apache-2.0
|
|
5 |
datasets:
|
6 |
- HuggingFaceH4/ultrachat_200k
|
7 |
- Felladrin/ChatML-ultrachat_200k
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
base_model: Locutusque/TinyMistral-248M
|
9 |
pipeline_tag: text-generation
|
10 |
widget:
|
@@ -45,20 +57,22 @@ widget:
|
|
45 |
inference:
|
46 |
parameters:
|
47 |
max_new_tokens: 250
|
48 |
-
penalty_alpha: 0.
|
49 |
-
top_k:
|
50 |
-
repetition_penalty: 1.03
|
51 |
-
guidance_scale: 1.3
|
52 |
---
|
53 |
|
54 |
-
# Locutusque's TinyMistral-248M trained on
|
55 |
|
56 |
- Base model: [Locutusque/TinyMistral-248M](https://huggingface.co/Locutusque/TinyMistral-248M) with two additional special tokens (`<|im_start|>` and `<|im_end|>`)
|
57 |
-
-
|
58 |
-
-
|
59 |
-
-
|
60 |
-
-
|
61 |
-
-
|
|
|
|
|
|
|
|
|
62 |
|
63 |
## Recommended Prompt Format
|
64 |
|
@@ -73,10 +87,8 @@ inference:
|
|
73 |
## Recommended Inference Parameters
|
74 |
|
75 |
```yml
|
76 |
-
penalty_alpha: 0.
|
77 |
-
top_k:
|
78 |
-
repetition_penalty: 1.03
|
79 |
-
guidance_scale: 1.3
|
80 |
```
|
81 |
|
82 |
## Usage Example
|
@@ -84,7 +96,7 @@ guidance_scale: 1.3
|
|
84 |
```python
|
85 |
from transformers import pipeline
|
86 |
|
87 |
-
generate = pipeline("text-generation", "Felladrin/TinyMistral-248M-Chat-
|
88 |
|
89 |
messages = [
|
90 |
{
|
@@ -110,10 +122,8 @@ prompt = generate.tokenizer.apply_chat_template(messages, tokenize=False, add_ge
|
|
110 |
output = generate(
|
111 |
prompt,
|
112 |
max_new_tokens=256,
|
113 |
-
penalty_alpha=0.
|
114 |
-
top_k=
|
115 |
-
repetition_penalty=1.03,
|
116 |
-
guidance_scale=1.3,
|
117 |
)
|
118 |
|
119 |
print(output[0]["generated_text"])
|
@@ -126,10 +136,11 @@ This model was trained with [SFTTrainer](https://huggingface.co/docs/trl/main/en
|
|
126 |
| Hyperparameter | Value |
|
127 |
| :--------------------- | :-------------------------------------------- |
|
128 |
| Learning rate | 2e-5 |
|
129 |
-
| Total train batch size |
|
130 |
| Max. sequence length | 2048 |
|
131 |
-
| Weight decay | 0
|
132 |
| Warmup ratio | 0.1 |
|
|
|
133 |
| Optimizer | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
134 |
| Scheduler | cosine |
|
135 |
| Seed | 42 |
|
|
|
5 |
datasets:
|
6 |
- HuggingFaceH4/ultrachat_200k
|
7 |
- Felladrin/ChatML-ultrachat_200k
|
8 |
+
- Open-Orca/OpenOrca
|
9 |
+
- Felladrin/ChatML-OpenOrca
|
10 |
+
- hkust-nlp/deita-10k-v0
|
11 |
+
- Felladrin/ChatML-deita-10k-v0
|
12 |
+
- LDJnr/Capybara
|
13 |
+
- Felladrin/ChatML-Capybara
|
14 |
+
- databricks/databricks-dolly-15k
|
15 |
+
- Felladrin/ChatML-databricks-dolly-15k
|
16 |
+
- euclaise/reddit-instruct-curated
|
17 |
+
- Felladrin/ChatML-reddit-instruct-curated
|
18 |
+
- CohereForAI/aya_dataset
|
19 |
+
- Felladrin/ChatML-aya_dataset
|
20 |
base_model: Locutusque/TinyMistral-248M
|
21 |
pipeline_tag: text-generation
|
22 |
widget:
|
|
|
57 |
inference:
|
58 |
parameters:
|
59 |
max_new_tokens: 250
|
60 |
+
penalty_alpha: 0.5
|
61 |
+
top_k: 5
|
|
|
|
|
62 |
---
|
63 |
|
64 |
+
# Locutusque's TinyMistral-248M trained on chat datasets
|
65 |
|
66 |
- Base model: [Locutusque/TinyMistral-248M](https://huggingface.co/Locutusque/TinyMistral-248M) with two additional special tokens (`<|im_start|>` and `<|im_end|>`)
|
67 |
+
- Datasets:
|
68 |
+
- [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-ultrachat_200k)] [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
|
69 |
+
- [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-OpenOrca)] [Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca)
|
70 |
+
- [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-deita-10k-v0)] [hkust-nlp/deita-10k-v0](https://huggingface.co/datasets/hkust-nlp/deita-10k-v0)
|
71 |
+
- [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-Capybara)] [LDJnr/Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
|
72 |
+
- [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-databricks-dolly-15k)] [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
|
73 |
+
- [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-reddit-instruct-curated)] [euclaise/reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated)
|
74 |
+
- [[ChatML](https://huggingface.co/datasets/Felladrin/ChatML-aya_dataset)] [CohereForAI/aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset)
|
75 |
+
- License: [Apache License 2.0](https://huggingface.co/Felladrin/TinyMistral-248M-Chat-v2/resolve/main/license.txt)
|
76 |
|
77 |
## Recommended Prompt Format
|
78 |
|
|
|
87 |
## Recommended Inference Parameters
|
88 |
|
89 |
```yml
|
90 |
+
penalty_alpha: 0.5
|
91 |
+
top_k: 5
|
|
|
|
|
92 |
```
|
93 |
|
94 |
## Usage Example
|
|
|
96 |
```python
|
97 |
from transformers import pipeline
|
98 |
|
99 |
+
generate = pipeline("text-generation", "Felladrin/TinyMistral-248M-Chat-v2")
|
100 |
|
101 |
messages = [
|
102 |
{
|
|
|
122 |
output = generate(
|
123 |
prompt,
|
124 |
max_new_tokens=256,
|
125 |
+
penalty_alpha=0.5,
|
126 |
+
top_k=5,
|
|
|
|
|
127 |
)
|
128 |
|
129 |
print(output[0]["generated_text"])
|
|
|
136 |
| Hyperparameter | Value |
|
137 |
| :--------------------- | :-------------------------------------------- |
|
138 |
| Learning rate | 2e-5 |
|
139 |
+
| Total train batch size | 32 |
|
140 |
| Max. sequence length | 2048 |
|
141 |
+
| Weight decay | 0.01 |
|
142 |
| Warmup ratio | 0.1 |
|
143 |
+
| NEFTune Noise Alpha | 5 |
|
144 |
| Optimizer | Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
145 |
| Scheduler | cosine |
|
146 |
| Seed | 42 |
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 992108712
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:52178bd78ce2e9eaff3fba98236b261d0c97c5423b6eb1dee8d6d3abe1a37850
|
3 |
size 992108712
|