metadata
tags:
- mamba2
license: mit
mamba2-2.7b-av
Introduction
This is a mirror model to mamba2-2.7b which is compatible with mamba2-torch, a Hugging Face compatible mamba2 library that is not dependent on the original cuda wheels of the original mamba repo. Credit goes to the original authors of Mamba2 and the transformers library by Hugging Face. Without their work, this would not be possible.
NOTE: mamba2-torch
offers different optimisation paths to use:
- Triton kernels and causal-conv1d ("fastest")
- Triton kernels only (default)
- Pure PyTorch
How to Get Started with the Model
You can follow the instructions in the mamba2-torch repo for a more detailed explanation. First of all, you should install the mamba2-torch lib:
git clone https://github.com/vasqu/mamba2-torch.git
cd mamba2-torch
pip install .
Then you can download this repository here via git lfs and then use the files locally the following way (after installing mamba2-torch):
from transformers import AutoTokenizer
from mamba2_torch import Mamba2Model, Mamba2ForCausalLM, Mamba2Config
device = "cuda"
mamba2_hf_path = "<path-to-converted-model>"
model = Mamba2ForCausalLM.from_pretrained(mamba2_hf_path, local_files_only=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(mamba2_hf_path, local_files_only=True)
input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"].to(device)
# expected output (2.7b): `["Hey how are you doing? I'm doing good. I'm just trying to"]`
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
Citation
BibTeX:
@inproceedings{mamba2,
title={Transformers are {SSM}s: Generalized Models and Efficient Algorithms Through Structured State Space Duality},
author={Dao, Tri and Gu, Albert},
booktitle={International Conference on Machine Learning (ICML)},
year={2024}
}