AntonV
/

mamba2-780m-av

Inference Endpoints

Model card Files Files and versions Community

AntonV commited on Jun 16

Commit

d1dc934

•

1 Parent(s): 48b9349

Create README.md

Files changed (1) hide show

README.md +57 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+tags:
+- mamba2
+license: mit
+---
+# mamba2-780m-av
+## Introduction
+This is a mirror model to [mamba2-780m](https://huggingface.co/state-spaces/mamba2-780m) which is compatible with [mamba2-torch](https://github.com/vasqu/mamba2-torch), a Hugging Face compatible mamba2 library that is not dependent on the original cuda wheels of the [original mamba repo](https://github.com/state-spaces/mamba). Credit goes to the original authors of [Mamba2](https://arxiv.org/abs/2405.21060) and the [transformers](https://github.com/huggingface/transformers) library by Hugging Face. Without their work, this would not be possible.
+NOTE: `mamba2-torch` offers different optimisation paths to use:
+- Triton kernels and [causal-conv1d](https://github.com/Dao-AILab/causal-conv1d) ("fastest")
+- Triton kernels only (default)
+- Pure PyTorch
+## How to Get Started with the Model
+You can follow the instructions in the [mamba2-torch repo](https://github.com/vasqu/mamba2-torch) for a more detailed explanation. First of all, you should install the mamba2-torch lib:
+```bash
+git clone https://github.com/vasqu/mamba2-torch.git
+cd mamba2-torch
+pip install .
+```
+Then you can download this repository here via git lfs and then use the files locally the following way (after installing mamba2-torch):
+```python
+from transformers import AutoTokenizer
+from mamba2_torch import Mamba2Model, Mamba2ForCausalLM, Mamba2Config
+device = "cuda"
+mamba2_hf_path = "<path-to-converted-model>"
+model = Mamba2ForCausalLM.from_pretrained(mamba2_hf_path, local_files_only=True).to(device)
+tokenizer = AutoTokenizer.from_pretrained(mamba2_hf_path, local_files_only=True)
+input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"].to(device)
+# expected output (780m): `["Hey how are you doing?\n\nI'm doing great.  I'm"]`
+out = model.generate(input_ids, max_new_tokens=10)
+print(tokenizer.batch_decode(out))
+```
+## Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+```bibtex
+@inproceedings{mamba2,
+ title={Transformers are {SSM}s: Generalized Models and Efficient Algorithms Through Structured State Space Duality},
+ author={Dao, Tri and Gu, Albert},
+ booktitle={International Conference on Machine Learning (ICML)},
+ year={2024}
+}
+```