mosaicml
/

mpt-1b-redpajama-200b-dolly

Text Generation

Model card Files Files and versions Community

jfrankle commited on Apr 20, 2023

Commit

a2204f6

•

1 Parent(s): e4e26cc

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -4,9 +4,9 @@ datasets:
 - togethercomputer/RedPajama-Data-1T
 ---
-# MPT-1b-RedPajama-200b
-MPT-1b-RedPajama-200b is a 1.3 billion parameter decoder-only transformer pre-trained on the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) and subsequently fine-tuned on the [Databricks Dolly](https://github.com/databrickslabs/dolly/tree/master/data) instruction dataset.
 The model was pre-trained for 200B tokens by sampling from the subsets of the RedPajama dataset in the same proportions as were used by the [Llama series of models](https://arxiv.org/abs/2302.13971).
 This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.

 - togethercomputer/RedPajama-Data-1T
 ---
+# MPT-1b-RedPajama-200b-dolly
+MPT-1b-RedPajama-200b-dolly is a 1.3 billion parameter decoder-only transformer pre-trained on the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) and subsequently fine-tuned on the [Databricks Dolly](https://github.com/databrickslabs/dolly/tree/master/data) instruction dataset.
 The model was pre-trained for 200B tokens by sampling from the subsets of the RedPajama dataset in the same proportions as were used by the [Llama series of models](https://arxiv.org/abs/2302.13971).
 This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.