Text-to-text Generation Models (LLMs, Llama, GPT, ...)
Collection
5130 items
•
Updated
•
12
Frequently Asked Questions
Getting started with DBRX models is easy with the transformers
library. The model requires ~264GB of RAM and the following packages:
pip install "torch==2.4.0" "transformers>=4.39.2" "tiktoken>=0.6.0" "bitsandbytes"
If you'd like to speed up download time, you can use the hf_transfer
package as described by Huggingface here.
pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
You will need to request access to this repository to download the model. Once this is granted,
obtain an access token with read
permission, and supply the token below.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("PrunaAI/dbrx-instruct-bnb-4bit", trust_remote_code=True, token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("PrunaAI/dbrx-instruct-bnb-4bit", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, token="hf_YOUR_TOKEN")
input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
The license of the smashed model follows the license of the original model. Please check the license of the original model databricks/dbrx-instruct before using this model which provided the base model. The license of the pruna-engine
is here on Pypi.