mosaicml
/

mpt-1b-redpajama-200b

Text Generation

Model card Files Files and versions Community

daking commited on Apr 20, 2023

Commit

ee5674f

•

1 Parent(s): 7d303f9

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ datasets:
 Mosaic-1b-RedPajama-200b is a 1.4 billion parameter decoder-only transformer trained on the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T).
 The model was trained for 200B tokens by sampling from the subsets of the RedPajama dataset in the same proportions as were used by the [Llama series of models](https://arxiv.org/abs/2302.13971).
-This model was trained by [MosaicML](https://www.mosaicml.com) and follows the a modified decoder-only transformer architecture.
 ## Model Date
@@ -24,6 +24,12 @@ import transformers
 model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mosaic-llama-redpajama-final-candidate', trust_remote_code=True)```
 ```
 ## Model Description
 This model uses the MosaicML LLM codebase, which can be found in the [MosaicML Examples Repository](https://github.com/mosaicml/examples/tree/v0.0.4/examples/llm).

 Mosaic-1b-RedPajama-200b is a 1.4 billion parameter decoder-only transformer trained on the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T).
 The model was trained for 200B tokens by sampling from the subsets of the RedPajama dataset in the same proportions as were used by the [Llama series of models](https://arxiv.org/abs/2302.13971).
+This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
 ## Model Date
 model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mosaic-llama-redpajama-final-candidate', trust_remote_code=True)```
 ```
+To use the optimized triton implementation of FlashAttention, you can load with `attn_impl='triton'` and move the model to `bfloat16` like so:
+```python
+model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mosaic-1b-redpajama-200b', trust_remote_code=True, attn_impl='triton')
+model.to(device='cuda:0', dtype=torch.bfloat16)
+```
 ## Model Description
 This model uses the MosaicML LLM codebase, which can be found in the [MosaicML Examples Repository](https://github.com/mosaicml/examples/tree/v0.0.4/examples/llm).