Intel
/

opt-1.3b-int4-inc-recipe

English

Model card Files Files and versions Community

isaacmac commited on 22 days ago

Commit

00f1d7c

•

1 Parent(s): 1efa8e7

Update README.md

Browse files

Files changed (1) hide show

README.md +54 -19

README.md CHANGED Viewed

@@ -6,50 +6,85 @@ language:
 ---
-## Model Details
-This model is an int4 model with group_size 128 of [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)  generated by [intel/auto-round](https://github.com/intel/auto-round).
 Inference of this model is compatible with AutoGPTQ's Kernel.
-### Reproduce the model
 Here is the sample command to reproduce the model
 ```bash
-git clone https://github.com/intel/auto-round
-cd auto-round/examples/language-modeling
-pip install -r requirements.txt
-python3 main.py \
---model_name  facebook/opt-1.3b \
 --device 0 \
 --group_size 128 \
 --bits 4 \
 --iters 1000 \
 --nsamples 512 \
---deployment_device 'gpu' \
 --minmax_lr 2e-3 \
 --disable_quanted_input \
 --output_dir "./tmp_autoround" \
 ```
-### Evaluate the model
-Install [lm-eval-harness 0.4.2](https://github.com/EleutherAI/lm-evaluation-harness.git) from source.
 ```bash
-lm_eval --model hf --model_args pretrained="Intel/opt-1.3b-int4-inc",autogptq=True,gptq_use_triton=True --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu --batch_size 32
 ```
 | Metric         | FP16   | INT4   |
 | -------------- | ------ | ------ |
 | Avg.           | 0.4405 | 0.4315 |

 ---
+## Model Recipe Details
+This is an int4 model recipe with group_size 128 of [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)  generated by [intel/auto-round](https://github.com/intel/auto-round).
 Inference of this model is compatible with AutoGPTQ's Kernel.
+### Quantize the model
 Here is the sample command to reproduce the model
 ```bash
+pip install auto-round
+auto-round
+--model  facebook/opt-1.3b \
 --device 0 \
 --group_size 128 \
 --bits 4 \
 --iters 1000 \
 --nsamples 512 \
+--format 'auto_gptq' \
 --minmax_lr 2e-3 \
 --disable_quanted_input \
 --output_dir "./tmp_autoround" \
 ```
+## How to use
+### INT4 Inference with IPEX on Intel CPU
+Install the latest [Intel Extension for Pytorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel Neural Compressor](https://github.com/intel/neural-compressor)
 ```bash
+pip install torch --index-url https://download.pytorch.org/whl/cpu
+pip install intel_extension_for_pytorch
+pip install neural_compressor_pt
 ```
+```python
+from transformers import AutoTokenizer
+from neural_compressor.transformers import AutoModelForCausalLM
+## note: use quantized model directory name below
+model_name_or_path="./tmp_autoround/<model directory name>"
+q_model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
+prompt = "Once upon a time, a little girl"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
+print(tokenizer.decode(q_model.generate(**tokenizer(prompt, return_tensors="pt").to(q_model.device),max_new_tokens=50)[0]))
+## Once upon a time, a little girl was born. She was a beautiful little girl, with a beautiful smile. She was a little girl who loved to play. She was a little girl who loved to sing.She was a little girl who loved to dance.
+```
+### INT4 Inference on Intel Gaudi Accelerator
+docker image with Gaudi Software Stack is recommended. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/).
+```python
+import habana_frameworks.torch.core as htcore
+from neural_compressor.torch.quantization import load
+from transformers import  AutoTokenizer, AutoModelForCausalLM
+## note: use quantized model directory name below
+model_name_or_path="./tmp_autoround/<model directory name>"
+model = load(
+    model_name_or_path=model_name_or_path,
+    format="huggingface",
+    device="hpu"
+)
+prompt = "Once upon a time, a little girl"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
+print(tokenizer.decode(model.generate(**tokenizer(prompt, return_tensors="pt").to("hpu"),max_new_tokens=50)[0]))
+```
+## Accuracy Result
 | Metric         | FP16   | INT4   |
 | -------------- | ------ | ------ |
 | Avg.           | 0.4405 | 0.4315 |