abhinavkulkarni
/

tiiuae-falcon-7b-instruct-w4-g64-awq

Text Generation

RefinedWebModel

text-generation-inference

Model card Files Files and versions Community

abhinavkulkarni commited on Jul 12, 2023

Commit

137bf22

•

1 Parent(s): 911f42b

Update README.md

Files changed (1) hide show

README.md +3 -6

README.md CHANGED Viewed

@@ -24,8 +24,6 @@ Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/
 This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
-For Docker users, the `nvcr.io/nvidia/pytorch:23.06-py3` image is runtime v12.1 but otherwise the same as the configuration above and has also been verified to work.
 ## How to Use
 ```bash
@@ -34,7 +32,6 @@ git clone https://github.com/mit-han-lab/llm-awq \
 && git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
 && pip install -e . \
 && cd awq/kernels \
-&& export TORCH_CUDA_ARCH_LIST='8.0 8.6 8.7 8.9 9.0' \
 && python setup.py install
 ```
@@ -51,7 +48,7 @@ model_name = "tiiuae/falcon-7b-instruct"
 config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
 # Tokenizer
-tokenizer = AutoTokenizer.from_pretrained(model_name)
 # Model
 w_bit = 4
@@ -60,7 +57,7 @@ q_config = {
     "q_group_size": 64,
 }
-load_quant = hf_hub_download('abhinavkulkarni/tiiaue-falcon-7b-instruct-w4-g64-awq', 'pytorch_model.bin')
 with init_empty_weights():
     model = AutoModelForCausalLM.from_config(config=config,
@@ -99,7 +96,7 @@ This evaluation was done using [LM-Eval](https://github.com/EleutherAI/lm-evalua
 |        |       |byte_perplexity| 1.6490|   |      |
 |        |       |bits_per_byte  | 0.7216|   |      |
-[Falcon-7B-Instruct (4-bit 64-group AWQ)](https://huggingface.co/abhinavkulkarni/falcon-7b-instruct-w4-g64-awq)
 |  Task  |Version|    Metric     | Value |   |Stderr|
 |--------|------:|---------------|------:|---|------|

 This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
 ## How to Use
 ```bash
 && git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
 && pip install -e . \
 && cd awq/kernels \
 && python setup.py install
 ```
 config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
 # Tokenizer
+tokenizer = AutoTokenizer.from_pretrained(config.tokenizer_name)
 # Model
 w_bit = 4
     "q_group_size": 64,
 }
+load_quant = hf_hub_download('abhinavkulkarni/tiiuae-falcon-7b-instruct-w4-g64-awq', 'pytorch_model.bin')
 with init_empty_weights():
     model = AutoModelForCausalLM.from_config(config=config,
 |        |       |byte_perplexity| 1.6490|   |      |
 |        |       |bits_per_byte  | 0.7216|   |      |
+[Falcon-7B-Instruct (4-bit 64-group AWQ)](https://huggingface.co/abhinavkulkarni/tiiuae-falcon-7b-instruct-w4-g64-awq)
 |  Task  |Version|    Metric     | Value |   |Stderr|
 |--------|------:|---------------|------:|---|------|