English
isaacmac commited on
Commit
00f1d7c
1 Parent(s): 1efa8e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -19
README.md CHANGED
@@ -6,50 +6,85 @@ language:
6
  ---
7
 
8
 
9
- ## Model Details
10
 
11
- This model is an int4 model with group_size 128 of [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b) generated by [intel/auto-round](https://github.com/intel/auto-round).
12
  Inference of this model is compatible with AutoGPTQ's Kernel.
13
 
14
 
15
 
16
 
17
-
18
-
19
-
20
- ### Reproduce the model
21
 
22
  Here is the sample command to reproduce the model
23
 
24
  ```bash
25
- git clone https://github.com/intel/auto-round
26
- cd auto-round/examples/language-modeling
27
- pip install -r requirements.txt
28
- python3 main.py \
29
- --model_name facebook/opt-1.3b \
30
  --device 0 \
31
  --group_size 128 \
32
  --bits 4 \
33
  --iters 1000 \
34
  --nsamples 512 \
35
- --deployment_device 'gpu' \
36
  --minmax_lr 2e-3 \
37
  --disable_quanted_input \
38
  --output_dir "./tmp_autoround" \
39
-
40
  ```
41
 
 
42
 
43
-
44
-
45
- ### Evaluate the model
46
-
47
- Install [lm-eval-harness 0.4.2](https://github.com/EleutherAI/lm-evaluation-harness.git) from source.
48
 
49
  ```bash
50
- lm_eval --model hf --model_args pretrained="Intel/opt-1.3b-int4-inc",autogptq=True,gptq_use_triton=True --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu --batch_size 32
 
 
51
  ```
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  | Metric | FP16 | INT4 |
54
  | -------------- | ------ | ------ |
55
  | Avg. | 0.4405 | 0.4315 |
 
6
  ---
7
 
8
 
9
+ ## Model Recipe Details
10
 
11
+ This is an int4 model recipe with group_size 128 of [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b) generated by [intel/auto-round](https://github.com/intel/auto-round).
12
  Inference of this model is compatible with AutoGPTQ's Kernel.
13
 
14
 
15
 
16
 
17
+ ### Quantize the model
 
 
 
18
 
19
  Here is the sample command to reproduce the model
20
 
21
  ```bash
22
+ pip install auto-round
23
+ auto-round
24
+ --model facebook/opt-1.3b \
 
 
25
  --device 0 \
26
  --group_size 128 \
27
  --bits 4 \
28
  --iters 1000 \
29
  --nsamples 512 \
30
+ --format 'auto_gptq' \
31
  --minmax_lr 2e-3 \
32
  --disable_quanted_input \
33
  --output_dir "./tmp_autoround" \
 
34
  ```
35
 
36
+ ## How to use
37
 
38
+ ### INT4 Inference with IPEX on Intel CPU
39
+ Install the latest [Intel Extension for Pytorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel Neural Compressor](https://github.com/intel/neural-compressor)
 
 
 
40
 
41
  ```bash
42
+ pip install torch --index-url https://download.pytorch.org/whl/cpu
43
+ pip install intel_extension_for_pytorch
44
+ pip install neural_compressor_pt
45
  ```
46
 
47
+ ```python
48
+ from transformers import AutoTokenizer
49
+ from neural_compressor.transformers import AutoModelForCausalLM
50
+
51
+ ## note: use quantized model directory name below
52
+ model_name_or_path="./tmp_autoround/<model directory name>"
53
+ q_model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
54
+
55
+ prompt = "Once upon a time, a little girl"
56
+
57
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
58
+ print(tokenizer.decode(q_model.generate(**tokenizer(prompt, return_tensors="pt").to(q_model.device),max_new_tokens=50)[0]))
59
+ ## Once upon a time, a little girl was born. She was a beautiful little girl, with a beautiful smile. She was a little girl who loved to play. She was a little girl who loved to sing.She was a little girl who loved to dance.
60
+ ```
61
+
62
+ ### INT4 Inference on Intel Gaudi Accelerator
63
+ docker image with Gaudi Software Stack is recommended. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/).
64
+
65
+ ```python
66
+ import habana_frameworks.torch.core as htcore
67
+ from neural_compressor.torch.quantization import load
68
+ from transformers import AutoTokenizer, AutoModelForCausalLM
69
+
70
+ ## note: use quantized model directory name below
71
+ model_name_or_path="./tmp_autoround/<model directory name>"
72
+
73
+ model = load(
74
+ model_name_or_path=model_name_or_path,
75
+ format="huggingface",
76
+ device="hpu"
77
+ )
78
+
79
+ prompt = "Once upon a time, a little girl"
80
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
81
+ print(tokenizer.decode(model.generate(**tokenizer(prompt, return_tensors="pt").to("hpu"),max_new_tokens=50)[0]))
82
+
83
+ ```
84
+
85
+ ## Accuracy Result
86
+
87
+
88
  | Metric | FP16 | INT4 |
89
  | -------------- | ------ | ------ |
90
  | Avg. | 0.4405 | 0.4315 |