Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ datasets:
|
|
8 |
|
9 |
Mosaic-1b-RedPajama-200b is a 1.4 billion parameter decoder-only transformer trained on the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T).
|
10 |
The model was trained for 200B tokens by sampling from the subsets of the RedPajama dataset in the same proportions as were used by the [Llama series of models](https://arxiv.org/abs/2302.13971).
|
11 |
-
This model was trained by [MosaicML](https://www.mosaicml.com) and follows
|
12 |
|
13 |
## Model Date
|
14 |
|
@@ -24,6 +24,12 @@ import transformers
|
|
24 |
model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mosaic-llama-redpajama-final-candidate', trust_remote_code=True)```
|
25 |
```
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
## Model Description
|
28 |
|
29 |
This model uses the MosaicML LLM codebase, which can be found in the [MosaicML Examples Repository](https://github.com/mosaicml/examples/tree/v0.0.4/examples/llm).
|
|
|
8 |
|
9 |
Mosaic-1b-RedPajama-200b is a 1.4 billion parameter decoder-only transformer trained on the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T).
|
10 |
The model was trained for 200B tokens by sampling from the subsets of the RedPajama dataset in the same proportions as were used by the [Llama series of models](https://arxiv.org/abs/2302.13971).
|
11 |
+
This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
|
12 |
|
13 |
## Model Date
|
14 |
|
|
|
24 |
model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mosaic-llama-redpajama-final-candidate', trust_remote_code=True)```
|
25 |
```
|
26 |
|
27 |
+
To use the optimized triton implementation of FlashAttention, you can load with `attn_impl='triton'` and move the model to `bfloat16` like so:
|
28 |
+
```python
|
29 |
+
model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mosaic-1b-redpajama-200b', trust_remote_code=True, attn_impl='triton')
|
30 |
+
model.to(device='cuda:0', dtype=torch.bfloat16)
|
31 |
+
```
|
32 |
+
|
33 |
## Model Description
|
34 |
|
35 |
This model uses the MosaicML LLM codebase, which can be found in the [MosaicML Examples Repository](https://github.com/mosaicml/examples/tree/v0.0.4/examples/llm).
|