alokabhishek's picture
Updated Readme
2ff5a95 verified
|
raw
history blame
3.5 kB
---
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- bitsandbytes
- quantized
- 4bit
- Mistral
- Mistral-7B
- bnb
---
# Model Card for Mistral-7B-Instruct-v0.2-bnb-4bit
<!-- Provide a quick summary of what the model is/does. -->
This repo contains 4-bit quantized (using bitsandbytes) model Mistral AI_'s Mistral-7B-Instruct-v0.2
## Model Details
- Model creator: [Mistral AI_](https://huggingface.co/mistralai)
- Original model: [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
### About 4 bit quantization using bitsandbytes
- QLoRA: Efficient Finetuning of Quantized LLMs: [arXiv - QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
- Hugging Face Blog post on 4-bit quantization using bitsandbytes: [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
- bitsandbytes github repo: [bitsandbytes github repo](https://github.com/TimDettmers/bitsandbytes)
# How to Get Started with the Model
Use the code below to get started with the model.
## How to run from Python code
#### First install the package
```shell
pip install -q -U bitsandbytes accelerate torch huggingface_hub
pip install -q -U git+https://github.com/huggingface/transformers.git # Install latest version of transformers
pip install -q -U git+https://github.com/huggingface/peft.git
pip install flash-attn --no-build-isolation
```
# Import
```python
import torch
import os
from torch import bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM
```
# Use a pipeline as a high-level helper
```python
model_id_mistral = "alokabhishek/Mistral-7B-Instruct-v0.2-bnb-4bit"
tokenizer_mistral = AutoTokenizer.from_pretrained(model_id_mistral, use_fast=True)
model_mistral = AutoModelForCausalLM.from_pretrained(
model_id_mistral,
device_map="auto"
)
pipe_mistral = pipeline(model=model_mistral, tokenizer=tokenizer_mistral, task='text-generation')
prompt_mistral = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
output_mistral = pipe_llama(prompt_mistral, max_new_tokens=512)
print(output_mistral[0]["generated_text"])
```
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
[More Information Needed]
### Downstream Use [optional]
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
[More Information Needed]
### Results
[More Information Needed]
## Model Card Authors [optional]
[More Information Needed]
## Model Card Contact
[More Information Needed]