clibrain
/

mamba-2.8b-instruct-openhermes

Text Generation

Inference Endpoints

Model card Files Files and versions Community

mrm8488 commited on Dec 12, 2023

Commit

634b04b

•

1 Parent(s): 4e334a5

Create README.md

Files changed (1) hide show

README.md +96 -0

README.md ADDED Viewed

	@@ -0,0 +1,96 @@

+---
+license: wtfpl
+datasets:
+- HuggingFaceH4/no_robots
+pipeline_tag: text-generation
+---
+# MAMBA (2.8B) 🐍 fine-tuned on OpenHerms
+Model Card is still WIP!
+## Base model info
+Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers.
+It is based on the line of progress on [structured state space models](https://github.com/state-spaces/s4),
+with an efficient hardware-aware design and implementation in the spirit of [FlashAttention](https://github.com/Dao-AILab/flash-attention).
+## Dataset info
+TBA
+## Usage
+```sh
+pip install transformers
+pip install causal-conv1d<=1.0.2
+pip install mamba-ssm
+```
+```py
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
+CHAT_TEMPLATE_ID = "HuggingFaceH4/zephyr-7b-beta"
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model_name = "clibrain/mamba-2.8b-instruct-openhermes"
+eos_token = "<|endoftext|>"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+tokenizer.eos_token = eos_token
+tokenizer.pad_token = tokenizer.eos_token
+tokenizer.chat_template = AutoTokenizer.from_pretrained(CHAT_TEMPLATE_ID).chat_template
+model = MambaLMHeadModel.from_pretrained(
+        model_name, device=device, dtype=torch.float16)
+history_dict: list[dict[str, str]] = []
+prompt = "Tell me 5 sites to visit in Spain"
+history_dict.append(dict(role="user", content=prompt))
+input_ids = tokenizer.apply_chat_template(
+            history_dict, return_tensors="pt", add_generation_prompt=True
+).to(device)
+out = model.generate(
+    input_ids=input_ids,
+    max_length=2000,
+    temperature=0.9,
+    top_p=0.7,
+    eos_token_id=tokenizer.eos_token_id,
+)
+decoded = tokenizer.batch_decode(out)
+assistant_message = (
+    decoded[0].split("<|assistant|>\n")[-1].replace(eos, "")
+)
+print(assistant_message)
+```
+## Gradio Demo
+```sh
+git clone https://github.com/mrm8488/mamba-chat.git
+cd mamba-chat
+pip install -r requirements.txt
+pip install -q gradio==4.8.0
+python app.py \
+--model clibrain/mamba-2.8b-chat-no_robots \
+--share
+```
+## Evaluations
+Coming soon!
+## Acknowledgments
+Thanks to [mamba-chat](https://github.com/havenhq/mamba-chat/tree/main) for heavily inspiring our work