Model Card for alokabhishek/falcon-7b-instruct-bnb-4bit
This repo contains 4-bit quantized (using bitsandbytes) model of Technology Innovation Institute's tiiuae/falcon-7b-instruct
Model Details
- Model creator: Technology Innovation Institute
- Original model: falcon-7b-instruct
About 4 bit quantization using bitsandbytes
QLoRA: Efficient Finetuning of Quantized LLMs: arXiv - QLoRA: Efficient Finetuning of Quantized LLMs
Hugging Face Blog post on 4-bit quantization using bitsandbytes: Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
bitsandbytes github repo: bitsandbytes github repo
How to Get Started with the Model
Use the code below to get started with the model.
How to run from Python code
First install the package
pip install -q -U bitsandbytes accelerate torch huggingface_hub
pip install -q -U git+https://github.com/huggingface/transformers.git # Install latest version of transformers
pip install -q -U git+https://github.com/huggingface/peft.git
pip install flash-attn --no-build-isolation
Import
import torch
import os
from torch import bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM
Use a pipeline as a high-level helper
model_id_falcon = "alokabhishek/falcon-7b-instruct-bnb-4bit"
tokenizer_falcon = AutoTokenizer.from_pretrained(model_id_falcon, use_fast=True)
model_falcon = AutoModelForCausalLM.from_pretrained(
model_id_falcon,
device_map="auto"
)
pipe_falcon = pipeline(model=model_falcon, tokenizer=tokenizer_falcon, task='text-generation')
prompt_falcon = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
output_falcon = pipe_falcon(prompt_falcon, max_new_tokens=512)
print(output_falcon[0]["generated_text"])
Uses
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 16
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.