eluzhnica
/

mpt-7b-8k-peft-compatible

Text Generation

StreamingDatasets

text-generation-inference

Model card Files Files and versions Community

sam-mosaic commited on Jul 18, 2023

Commit

19180fa

•

1 Parent(s): c75058f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ This model uses the MosaicML LLM codebase, which can be found in the [llm-foundr
 MPT-7B-8k is
-* **Licensed for the possibility of commercial use** (unlike [LLaMA](https://arxiv.org/abs/2302.13971)).
 * **Trained on a large amount of data** (1.5T tokens like [XGen](https://huggingface.co/Salesforce/xgen-7b-8k-base) vs. 1T for [LLaMA](https://arxiv.org/abs/2302.13971), 1T for [MPT-7B](https://www.mosaicml.com/blog/mpt-7b), 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
 * **Prepared to handle long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409). With ALiBi, the model can extrapolate beyond the 8k training sequence length to up to 10k, and with a few million tokens it can be finetuned to extrapolate much further.
 * **Capable of fast training and inference** via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer)

 MPT-7B-8k is
+* **Licensed for the possibility of commercial use.**
 * **Trained on a large amount of data** (1.5T tokens like [XGen](https://huggingface.co/Salesforce/xgen-7b-8k-base) vs. 1T for [LLaMA](https://arxiv.org/abs/2302.13971), 1T for [MPT-7B](https://www.mosaicml.com/blog/mpt-7b), 300B for [Pythia](https://github.com/EleutherAI/pythia), 300B for [OpenLLaMA](https://github.com/openlm-research/open_llama), and 800B for [StableLM](https://github.com/Stability-AI/StableLM)).
 * **Prepared to handle long inputs** thanks to [ALiBi](https://arxiv.org/abs/2108.12409). With ALiBi, the model can extrapolate beyond the 8k training sequence length to up to 10k, and with a few million tokens it can be finetuned to extrapolate much further.
 * **Capable of fast training and inference** via [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf) and [FasterTransformer](https://github.com/NVIDIA/FasterTransformer)