abhinavkulkarni
commited on
Commit
•
137bf22
1
Parent(s):
911f42b
Update README.md
Browse files
README.md
CHANGED
@@ -24,8 +24,6 @@ Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/
|
|
24 |
|
25 |
This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
|
26 |
|
27 |
-
For Docker users, the `nvcr.io/nvidia/pytorch:23.06-py3` image is runtime v12.1 but otherwise the same as the configuration above and has also been verified to work.
|
28 |
-
|
29 |
## How to Use
|
30 |
|
31 |
```bash
|
@@ -34,7 +32,6 @@ git clone https://github.com/mit-han-lab/llm-awq \
|
|
34 |
&& git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
|
35 |
&& pip install -e . \
|
36 |
&& cd awq/kernels \
|
37 |
-
&& export TORCH_CUDA_ARCH_LIST='8.0 8.6 8.7 8.9 9.0' \
|
38 |
&& python setup.py install
|
39 |
```
|
40 |
|
@@ -51,7 +48,7 @@ model_name = "tiiuae/falcon-7b-instruct"
|
|
51 |
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
|
52 |
|
53 |
# Tokenizer
|
54 |
-
tokenizer = AutoTokenizer.from_pretrained(
|
55 |
|
56 |
# Model
|
57 |
w_bit = 4
|
@@ -60,7 +57,7 @@ q_config = {
|
|
60 |
"q_group_size": 64,
|
61 |
}
|
62 |
|
63 |
-
load_quant = hf_hub_download('abhinavkulkarni/
|
64 |
|
65 |
with init_empty_weights():
|
66 |
model = AutoModelForCausalLM.from_config(config=config,
|
@@ -99,7 +96,7 @@ This evaluation was done using [LM-Eval](https://github.com/EleutherAI/lm-evalua
|
|
99 |
| | |byte_perplexity| 1.6490| | |
|
100 |
| | |bits_per_byte | 0.7216| | |
|
101 |
|
102 |
-
[Falcon-7B-Instruct (4-bit 64-group AWQ)](https://huggingface.co/abhinavkulkarni/falcon-7b-instruct-w4-g64-awq)
|
103 |
|
104 |
| Task |Version| Metric | Value | |Stderr|
|
105 |
|--------|------:|---------------|------:|---|------|
|
|
|
24 |
|
25 |
This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
|
26 |
|
|
|
|
|
27 |
## How to Use
|
28 |
|
29 |
```bash
|
|
|
32 |
&& git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
|
33 |
&& pip install -e . \
|
34 |
&& cd awq/kernels \
|
|
|
35 |
&& python setup.py install
|
36 |
```
|
37 |
|
|
|
48 |
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
|
49 |
|
50 |
# Tokenizer
|
51 |
+
tokenizer = AutoTokenizer.from_pretrained(config.tokenizer_name)
|
52 |
|
53 |
# Model
|
54 |
w_bit = 4
|
|
|
57 |
"q_group_size": 64,
|
58 |
}
|
59 |
|
60 |
+
load_quant = hf_hub_download('abhinavkulkarni/tiiuae-falcon-7b-instruct-w4-g64-awq', 'pytorch_model.bin')
|
61 |
|
62 |
with init_empty_weights():
|
63 |
model = AutoModelForCausalLM.from_config(config=config,
|
|
|
96 |
| | |byte_perplexity| 1.6490| | |
|
97 |
| | |bits_per_byte | 0.7216| | |
|
98 |
|
99 |
+
[Falcon-7B-Instruct (4-bit 64-group AWQ)](https://huggingface.co/abhinavkulkarni/tiiuae-falcon-7b-instruct-w4-g64-awq)
|
100 |
|
101 |
| Task |Version| Metric | Value | |Stderr|
|
102 |
|--------|------:|---------------|------:|---|------|
|