yuwenz commited on
Commit
3518c77
1 Parent(s): ac02df7

upload int8 onnx model

Browse files

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Files changed (2) hide show
  1. README.md +29 -3
  2. model.onnx +3 -0
README.md CHANGED
@@ -7,6 +7,7 @@ tags:
7
  - int8
8
  - Intel® Neural Compressor
9
  - PostTrainingStatic
 
10
  datasets:
11
  - glue
12
  metrics:
@@ -29,7 +30,9 @@ model-index:
29
 
30
  # INT8 electra-small-discriminator-mrpc
31
 
32
- ### Post-training static quantization
 
 
33
 
34
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
35
 
@@ -38,14 +41,14 @@ The original fp32 model comes from the fine-tuned model [electra-small-discrimin
38
  The calibration dataloader is the train dataloader. The default calibration sampling size 300 isn't divisible exactly by batch size 8, so
39
  the real sampling size is 304.
40
 
41
- ### Test result
42
 
43
  | |INT8|FP32|
44
  |---|:---:|:---:|
45
  | **Accuracy (eval-f1)** |0.9007|0.8983|
46
  | **Model size (MB)** |14|51.8|
47
 
48
- ### Load with optimum:
49
 
50
  ```python
51
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
@@ -53,3 +56,26 @@ int8_model = IncQuantizedModelForSequenceClassification.from_pretrained(
53
  'Intel/electra-small-discriminator-mrpc-int8-static',
54
  )
55
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - int8
8
  - Intel® Neural Compressor
9
  - PostTrainingStatic
10
+ - onnx
11
  datasets:
12
  - glue
13
  metrics:
 
30
 
31
  # INT8 electra-small-discriminator-mrpc
32
 
33
+ ## Post-training static quantization
34
+
35
+ ### PyTorch
36
 
37
  This is an INT8 PyTorch model quantized with [huggingface/optimum-intel](https://github.com/huggingface/optimum-intel) through the usage of [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
38
 
 
41
  The calibration dataloader is the train dataloader. The default calibration sampling size 300 isn't divisible exactly by batch size 8, so
42
  the real sampling size is 304.
43
 
44
+ #### Test result
45
 
46
  | |INT8|FP32|
47
  |---|:---:|:---:|
48
  | **Accuracy (eval-f1)** |0.9007|0.8983|
49
  | **Model size (MB)** |14|51.8|
50
 
51
+ #### Load with optimum:
52
 
53
  ```python
54
  from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
 
56
  'Intel/electra-small-discriminator-mrpc-int8-static',
57
  )
58
  ```
59
+
60
+ ### ONNX
61
+
62
+ This is an INT8 ONNX model quantized with [Intel® Neural Compressor](https://github.com/intel/neural-compressor).
63
+
64
+ The original fp32 model comes from the fine-tuned model [electra-small-discriminator-mrpc](https://huggingface.co/Intel/electra-small-discriminator-mrpc).
65
+
66
+ The calibration dataloader is the eval dataloader. The default calibration sampling size 100 isn't divisible exactly by batch size 8. So the real sampling size is 104.
67
+
68
+ #### Test result
69
+
70
+ | |INT8|FP32|
71
+ |---|:---:|:---:|
72
+ | **Accuracy (eval-f1)** |0.8993|0.8983|
73
+ | **Model size (MB)** |32|52|
74
+
75
+
76
+ #### Load ONNX model:
77
+
78
+ ```python
79
+ from optimum.onnxruntime import ORTModelForSequenceClassification
80
+ model = ORTModelForSequenceClassification.from_pretrained('Intel/electra-small-discriminator-mrpc-int8-static')
81
+ ```
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9aaab0ea74e1aba289dae90f053c4d7dbdb9ebc577100b77cdd8736cee3f8683
3
+ size 32868991