sam-mosaic
commited on
Commit
•
c45ca0a
1
Parent(s):
abe8dd5
Update README.md
Browse files
README.md
CHANGED
@@ -21,6 +21,7 @@ inference: false
|
|
21 |
|
22 |
MPT-7B-Instruct-8k is a model for long-form instruction following, especially question-answering on and summarization of longer documents.
|
23 |
It is built by finetuning [MPT-7B-8k](https://huggingface.co/mosaicml/mpt-7b-8k) on [Dolly HHRLHF](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets. It is also trained on [Competition Math](https://huggingface.co/datasets/competition_math), [Duorc](https://huggingface.co/datasets/duorc), [CoT GSM8k](https://huggingface.co/datasets/conceptofmind/cot_submix_original), [Qasper](https://huggingface.co/datasets/allenai/qasper), [Quality](https://huggingface.co/datasets/emozilla/quality), [Summ Screen FD](https://huggingface.co/datasets/tau/scrolls) and [Spider](https://huggingface.co/datasets/spider).
|
|
|
24 |
* License: _CC-By-SA-3.0_
|
25 |
* [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct-8k)
|
26 |
|
@@ -143,18 +144,17 @@ The model has been modified from a standard transformer in the following ways:
|
|
143 |
|
144 |
The model was trained on the following data mix:
|
145 |
|
146 |
-
| Data Source | Number of Tokens in Source | Proportion |
|
147 |
|-------------|----------------------------|------------|
|
148 |
-
|
|
149 |
-
|
|
150 |
-
|
|
151 |
-
|
|
152 |
-
|
|
153 |
-
|
|
154 |
-
|
|
155 |
-
|
|
156 |
-
|
157 |
-
"LongConversations" is a GPT3.5/4-generated dataset, details of which will be released at a later date.
|
158 |
|
159 |
### Training Configuration
|
160 |
|
|
|
21 |
|
22 |
MPT-7B-Instruct-8k is a model for long-form instruction following, especially question-answering on and summarization of longer documents.
|
23 |
It is built by finetuning [MPT-7B-8k](https://huggingface.co/mosaicml/mpt-7b-8k) on [Dolly HHRLHF](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) derived from the [Databricks Dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and the [Anthropic Helpful and Harmless (HH-RLHF)](https://huggingface.co/datasets/Anthropic/hh-rlhf) datasets. It is also trained on [Competition Math](https://huggingface.co/datasets/competition_math), [Duorc](https://huggingface.co/datasets/duorc), [CoT GSM8k](https://huggingface.co/datasets/conceptofmind/cot_submix_original), [Qasper](https://huggingface.co/datasets/allenai/qasper), [Quality](https://huggingface.co/datasets/emozilla/quality), [Summ Screen FD](https://huggingface.co/datasets/tau/scrolls) and [Spider](https://huggingface.co/datasets/spider).
|
24 |
+
This is the same dataset that [MPT-30B-Instruct](https://huggingface.co/mosaicml/mpt-30b-instruct) was trained on.
|
25 |
* License: _CC-By-SA-3.0_
|
26 |
* [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-instruct-8k)
|
27 |
|
|
|
144 |
|
145 |
The model was trained on the following data mix:
|
146 |
|
147 |
+
| Data Source | Number of Tokens in Source | Proportion |
|
148 |
|-------------|----------------------------|------------|
|
149 |
+
| competition_math | 1.6 M | 3.66% |
|
150 |
+
| cot_gsm8k | 3.36 M | 7.67% |
|
151 |
+
| dialogsum | 0.1 M | 0.23% |
|
152 |
+
| dolly_hhrlhf | 5.89 M | 13.43% |
|
153 |
+
| duorc | 7.8 M | 17.80% |
|
154 |
+
| qasper | 8.72 M | 19.90% |
|
155 |
+
| quality | 11.29 M | 25.78% |
|
156 |
+
| scrolls/summ_screen_fd | 4.97 M | 11.33% |
|
157 |
+
| spider | 0.089 M | 0.20% |
|
|
|
158 |
|
159 |
### Training Configuration
|
160 |
|