metadata

tags:
  - mamba2
license: mit

mamba2-780m-av

Introduction

This is a mirror model to mamba2-780m which is compatible with mamba2-torch, a Hugging Face compatible mamba2 library that is not dependent on the original cuda wheels of the original mamba repo. Credit goes to the original authors of Mamba2 and the transformers library by Hugging Face. Without their work, this would not be possible.

NOTE: mamba2-torch offers different optimisation paths to use:

Triton kernels and causal-conv1d ("fastest")
Triton kernels only (default)
Pure PyTorch

How to Get Started with the Model

You can follow the instructions in the mamba2-torch repo for a more detailed explanation. First of all, you should install the mamba2-torch lib:

git clone https://github.com/vasqu/mamba2-torch.git
cd mamba2-torch
pip install .

Then you can download this repository here via git lfs and then use the files locally the following way (after installing mamba2-torch):

from transformers import AutoTokenizer
from mamba2_torch import Mamba2Model, Mamba2ForCausalLM, Mamba2Config

device = "cuda"
mamba2_hf_path = "<path-to-converted-model>"

model = Mamba2ForCausalLM.from_pretrained(mamba2_hf_path, local_files_only=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(mamba2_hf_path, local_files_only=True)

input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"].to(device)

# expected output (780m): `["Hey how are you doing?\n\nI'm doing great.  I'm"]`
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

Citation

BibTeX:

@inproceedings{mamba2,
 title={Transformers are {SSM}s: Generalized Models and Efficient Algorithms Through Structured State Space Duality},
 author={Dao, Tri and Gu, Albert},
 booktitle={International Conference on Machine Learning (ICML)},
 year={2024}
}