abhinavkulkarni
commited on
Commit
•
c59e771
1
Parent(s):
d2be62b
Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ July 5, 2023
|
|
18 |
|
19 |
## Model License
|
20 |
|
21 |
-
Please refer to
|
22 |
|
23 |
Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/blob/main/LICENSE)).
|
24 |
|
@@ -26,6 +26,8 @@ Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/
|
|
26 |
|
27 |
This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
|
28 |
|
|
|
|
|
29 |
## How to Use
|
30 |
|
31 |
```bash
|
@@ -62,7 +64,7 @@ q_config = {
|
|
62 |
load_quant = hf_hub_download('abhinavkulkarni/open-llama-7b-open-instruct-w4-g128-awq', 'pytorch_model.bin')
|
63 |
|
64 |
with init_empty_weights():
|
65 |
-
model = AutoModelForCausalLM.
|
66 |
torch_dtype=torch.float16, trust_remote_code=True)
|
67 |
|
68 |
real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
|
|
|
18 |
|
19 |
## Model License
|
20 |
|
21 |
+
Please refer to original MPT model license ([link](https://huggingface.co/VMware/open-llama-7b-open-instruct)).
|
22 |
|
23 |
Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/blob/main/LICENSE)).
|
24 |
|
|
|
26 |
|
27 |
This model was successfully tested on CUDA driver v530.30.02 and runtime v11.7 with Python v3.10.11. Please note that AWQ requires NVIDIA GPUs with compute capability of 80 or higher.
|
28 |
|
29 |
+
For Docker users, the `nvcr.io/nvidia/pytorch:23.06-py3` image is runtime v12.1 but otherwise the same as the configuration above and has also been verified to work.
|
30 |
+
|
31 |
## How to Use
|
32 |
|
33 |
```bash
|
|
|
64 |
load_quant = hf_hub_download('abhinavkulkarni/open-llama-7b-open-instruct-w4-g128-awq', 'pytorch_model.bin')
|
65 |
|
66 |
with init_empty_weights():
|
67 |
+
model = AutoModelForCausalLM.from_config(config=config,
|
68 |
torch_dtype=torch.float16, trust_remote_code=True)
|
69 |
|
70 |
real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
|