TRI-ML
/

mamba-7b-rw

Text Generation

Model card Files Files and versions Community

sedrickkeh commited on Apr 16

Commit

443ad2e

•

1 Parent(s): efb6425

Update README.md

Files changed (1) hide show

README.md +2 -11

README.md CHANGED Viewed

@@ -94,7 +94,7 @@ We follow their training recipe and release our version of Mamba-7B.
 ## Training Details
 - Mamba-7B was trained using AWS SageMaker on 128 H100 80GB GPUs.
-- Training began in March 2023 and lasted around 3 weeks (some down time due to crashes and loss spikes)
 | **Hyperparameter** | **Value**  |
 |--------------------|------------|
 | Precision          | `bfloat16` |
@@ -108,18 +108,9 @@ We follow their training recipe and release our version of Mamba-7B.
 ## Usage
-(Not functional yet. Not sure if this is the right flow. Will work on this.)<br>
-This model was trained using [OpenLM](https://github.com/mlfoundations/open_lm/).
-To use HuggingFace models trained with OpenLM, first install the OpenLM package
-```bash
-pip install openlm
-```
-Importing from `openlm_hf` will automatically import the necessary classes.
 ```python
-from openlm_hf import *         # registers the Auto* classes
 from transformers import AutoTokenizer, AutoModelForCausalLM
 tokenizer = AutoTokenizer.from_pretrained("tri-ml/mamba-7b-rw")
 model = AutoModelForCausalLM.from_pretrained("tri-ml/mamba-7b-rw").cuda()

 ## Training Details
 - Mamba-7B was trained using AWS SageMaker on 128 H100 80GB GPUs.
+- Training began in March 2024 and lasted around 3 weeks (some down time due to crashes and loss spikes)
 | **Hyperparameter** | **Value**  |
 |--------------------|------------|
 | Precision          | `bfloat16` |
 ## Usage
+This model was trained using [OpenLM](https://github.com/mlfoundations/open_lm/). The weights have been converted to be compatible with HuggingFace.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 tokenizer = AutoTokenizer.from_pretrained("tri-ml/mamba-7b-rw")
 model = AutoModelForCausalLM.from_pretrained("tri-ml/mamba-7b-rw").cuda()