fxmarty commited on
Commit
f986621
1 Parent(s): c4c8551

add experience

Browse files
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: token-classification
3
+ datasets:
4
+ - conll2003
5
+ metrics:
6
+ - precision
7
+ - recall
8
+ - f1
9
+ - accuracy
10
+ tags:
11
+ - distilbert
12
+ ---
13
+
14
+ **task**: `token-classification`
15
+ **Backend:** `sagemaker-training`
16
+ **Backend args:** `{'instance_type': 'ml.g4dn.2xlarge', 'supported_instructions': 'avx512_vnni'}`
17
+ **Number of evaluation samples:** `1000`
18
+
19
+ Fixed parameters:
20
+ * **model_name_or_path**: `elastic/distilbert-base-uncased-finetuned-conll03-english`
21
+ * **dataset**:
22
+ * **path**: `conll2003`
23
+ * **eval_split**: `validation`
24
+ * **data_keys**: `{'primary': 'tokens'}`
25
+ * **ref_keys**: `['ner_tags']`
26
+ * **calibration_split**: `train`
27
+ * **node_exclusion**: `[]`
28
+ * **per_channel**: `False`
29
+ * **calibration**:
30
+ * **method**: `minmax`
31
+ * **num_calibration_samples**: `100`
32
+ * **framework**: `onnxruntime`
33
+ * **framework_args**:
34
+ * **opset**: `11`
35
+ * **optimization_level**: `1`
36
+ * **aware_training**: `False`
37
+
38
+ Benchmarked parameters:
39
+ * **quantization_approach**: `dynamic`, `static`
40
+ * **operators_to_quantize**: `['Add', 'MatMul']`, `['Add']`
41
+
42
+ # Evaluation
43
+ ## Non-time metrics
44
+ | quantization_approach | operators_to_quantize | | precision (original) | precision (optimized) | | recall (original) | recall (optimized) | | f1 (original) | f1 (optimized) | | accuracy (original) | accuracy (optimized) |
45
+ | :-------------------: | :-------------------: | :-: | :------------------: | :-------------------: | :-: | :---------------: | :----------------: | :-: | :-----------: | :------------: | :-: | :-----------------: | :------------------: |
46
+ | `dynamic` | `['Add', 'MatMul']` | \| | 0.937 | 0.937 | \| | 0.953 | 0.953 | \| | 0.945 | 0.945 | \| | 0.988 | 0.988 |
47
+ | `dynamic` | `['Add']` | \| | 0.937 | 0.937 | \| | 0.953 | 0.953 | \| | 0.945 | 0.945 | \| | 0.988 | 0.988 |
48
+ | `static` | `['Add', 'MatMul']` | \| | 0.937 | 0.074 | \| | 0.953 | 0.253 | \| | 0.945 | 0.114 | \| | 0.988 | 0.363 |
49
+ | `static` | `['Add']` | \| | 0.937 | 0.065 | \| | 0.953 | 0.186 | \| | 0.945 | 0.096 | \| | 0.988 | 0.340 |
50
+
51
+ ## Time metrics
52
+ Time benchmarks were run for 3 seconds per config.
53
+
54
+
55
+ Below, time metrics for batch size = 1, input length = 64.
56
+
57
+ | quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
58
+ | :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
59
+ | `dynamic` | `['Add', 'MatMul']` | \| | 57.64 | 12.30 | \| | 17.67 | 81.33 |
60
+ | `dynamic` | `['Add']` | \| | 43.51 | 29.42 | \| | 23.00 | 34.00 |
61
+ | `static` | `['Add', 'MatMul']` | \| | 43.05 | 21.11 | \| | 23.33 | 47.67 |
62
+ | `static` | `['Add']` | \| | 43.50 | 37.93 | \| | 23.00 | 26.67 |
63
+
64
+
65
+ Below, time metrics for batch size = 4, input length = 64.
66
+
67
+ | quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
68
+ | :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
69
+ | `dynamic` | `['Add', 'MatMul']` | \| | 119.50 | 39.92 | \| | 8.67 | 25.33 |
70
+ | `dynamic` | `['Add']` | \| | 119.62 | 107.42 | \| | 8.67 | 9.33 |
71
+ | `static` | `['Add', 'MatMul']` | \| | 120.23 | 56.94 | \| | 8.33 | 17.67 |
72
+ | `static` | `['Add']` | \| | 119.10 | 130.78 | \| | 8.67 | 7.67 |
73
+
74
+
75
+ Below, time metrics for batch size = 8, input length = 64.
76
+
77
+ | quantization_approach | operators_to_quantize | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
78
+ | :-------------------: | :-------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
79
+ | `dynamic` | `['Add', 'MatMul']` | \| | 165.84 | 75.45 | \| | 6.33 | 13.33 |
80
+ | `dynamic` | `['Add']` | \| | 214.65 | 211.41 | \| | 4.67 | 5.00 |
81
+ | `static` | `['Add', 'MatMul']` | \| | 166.53 | 129.00 | \| | 6.33 | 8.00 |
82
+ | `static` | `['Add']` | \| | 214.81 | 256.95 | \| | 4.67 | 4.00 |
83
+
runs.json ADDED
@@ -0,0 +1,580 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "model_name_or_path": "elastic/distilbert-base-uncased-finetuned-conll03-english",
4
+ "task": "token-classification",
5
+ "dataset": {
6
+ "path": "conll2003",
7
+ "eval_split": "validation",
8
+ "data_keys": {
9
+ "primary": "tokens",
10
+ "secondary": null
11
+ },
12
+ "ref_keys": [
13
+ "ner_tags"
14
+ ],
15
+ "name": null,
16
+ "calibration_split": "train"
17
+ },
18
+ "quantization_approach": "static",
19
+ "operators_to_quantize": [
20
+ "Add",
21
+ "MatMul"
22
+ ],
23
+ "node_exclusion": [],
24
+ "aware_training": false,
25
+ "per_channel": false,
26
+ "calibration": {
27
+ "method": "minmax",
28
+ "num_calibration_samples": 100,
29
+ "calibration_histogram_percentile": null,
30
+ "calibration_moving_average": null,
31
+ "calibration_moving_average_constant": null
32
+ },
33
+ "framework": "onnxruntime",
34
+ "framework_args": {
35
+ "opset": 11,
36
+ "optimization_level": 1
37
+ },
38
+ "hardware": "Architecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nByte Order: Little Endian\nAddress sizes: 46 bits physical, 48 bits virtual\nCPU(s): 8\nOn-line CPU(s) list: 0-7\nThread(s) per core: 2\nCore(s) per socket: 4\nSocket(s): 1\nNUMA node(s): 1\nVendor ID: GenuineIntel\nCPU family: 6\nModel: 85\nModel name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz\nStepping: 7\nCPU MHz: 3100.244\nBogoMIPS: 4999.99\nHypervisor vendor: KVM\nVirtualization type: full\nL1d cache: 128 KiB\nL1i cache: 128 KiB\nL2 cache: 4 MiB\nL3 cache: 35.8 MiB\nNUMA node0 CPU(s): 0-7\nVulnerability Itlb multihit: KVM: Vulnerable\nVulnerability L1tf: Mitigation; PTE Inversion\nVulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown\nVulnerability Meltdown: Mitigation; PTI\nVulnerability Spec store bypass: Vulnerable\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling\nVulnerability Srbds: Not affected\nVulnerability Tsx async abort: Not affected\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni\n",
39
+ "versions": {
40
+ "transformers": "4.20.1",
41
+ "optimum": "1.2.3.dev0",
42
+ "optimum_hash": "5ac9c0d9fd7e7cca55b2f9935b961ed5b6c50112"
43
+ },
44
+ "evaluation": {
45
+ "time": [
46
+ {
47
+ "batch_size": 4,
48
+ "input_length": 64,
49
+ "baseline": {
50
+ "nb_forwards": 25,
51
+ "throughput": 8.33,
52
+ "latency_mean": 120.23236996,
53
+ "latency_std": 0.989423927986037,
54
+ "latency_50": 120.445322,
55
+ "latency_90": 121.0641136,
56
+ "latency_95": 121.63786520000001,
57
+ "latency_99": 122.31954252,
58
+ "latency_999": 122.47620775200001
59
+ },
60
+ "optimized": {
61
+ "nb_forwards": 53,
62
+ "throughput": 17.67,
63
+ "latency_mean": 56.94031537735849,
64
+ "latency_std": 2.2044830948358625,
65
+ "latency_50": 56.199388,
66
+ "latency_90": 60.3284648,
67
+ "latency_95": 60.6057082,
68
+ "latency_99": 61.70255691999999,
69
+ "latency_999": 62.529690292000005
70
+ }
71
+ },
72
+ {
73
+ "batch_size": 8,
74
+ "input_length": 64,
75
+ "baseline": {
76
+ "nb_forwards": 19,
77
+ "throughput": 6.33,
78
+ "latency_mean": 166.53055257894738,
79
+ "latency_std": 1.575841987426849,
80
+ "latency_50": 166.638572,
81
+ "latency_90": 168.272883,
82
+ "latency_95": 168.7129504,
83
+ "latency_99": 169.52801488,
84
+ "latency_999": 169.711404388
85
+ },
86
+ "optimized": {
87
+ "nb_forwards": 24,
88
+ "throughput": 8.0,
89
+ "latency_mean": 129.002869375,
90
+ "latency_std": 0.6157854643813875,
91
+ "latency_50": 129.063924,
92
+ "latency_90": 129.7084936,
93
+ "latency_95": 129.9355643,
94
+ "latency_99": 130.24102448,
95
+ "latency_999": 130.313872748
96
+ }
97
+ },
98
+ {
99
+ "batch_size": 1,
100
+ "input_length": 64,
101
+ "baseline": {
102
+ "nb_forwards": 70,
103
+ "throughput": 23.33,
104
+ "latency_mean": 43.048573857142856,
105
+ "latency_std": 1.1204473128323003,
106
+ "latency_50": 42.845755,
107
+ "latency_90": 43.8944438,
108
+ "latency_95": 44.3052485,
109
+ "latency_99": 46.73122168000001,
110
+ "latency_999": 49.909082367999986
111
+ },
112
+ "optimized": {
113
+ "nb_forwards": 143,
114
+ "throughput": 47.67,
115
+ "latency_mean": 21.113699776223775,
116
+ "latency_std": 0.1930452254945551,
117
+ "latency_50": 21.085728,
118
+ "latency_90": 21.3874956,
119
+ "latency_95": 21.4500651,
120
+ "latency_99": 21.640094780000002,
121
+ "latency_999": 21.648399938
122
+ }
123
+ }
124
+ ],
125
+ "others": {
126
+ "baseline": {
127
+ "precision": 0.936836221352711,
128
+ "recall": 0.9533560864618885,
129
+ "f1": 0.9450239639131661,
130
+ "accuracy": 0.9880421708059153
131
+ },
132
+ "optimized": {
133
+ "precision": 0.07350512058143377,
134
+ "recall": 0.25312855517633676,
135
+ "f1": 0.1139272913466462,
136
+ "accuracy": 0.3629802589683719
137
+ }
138
+ }
139
+ },
140
+ "max_eval_samples": 1000,
141
+ "time_benchmark_args": {
142
+ "duration": 3,
143
+ "warmup_runs": 1
144
+ },
145
+ "model_type": "distilbert"
146
+ },
147
+ {
148
+ "model_name_or_path": "elastic/distilbert-base-uncased-finetuned-conll03-english",
149
+ "task": "token-classification",
150
+ "dataset": {
151
+ "path": "conll2003",
152
+ "eval_split": "validation",
153
+ "data_keys": {
154
+ "primary": "tokens",
155
+ "secondary": null
156
+ },
157
+ "ref_keys": [
158
+ "ner_tags"
159
+ ],
160
+ "name": null,
161
+ "calibration_split": "train"
162
+ },
163
+ "quantization_approach": "static",
164
+ "operators_to_quantize": [
165
+ "Add"
166
+ ],
167
+ "node_exclusion": [],
168
+ "aware_training": false,
169
+ "per_channel": false,
170
+ "calibration": {
171
+ "method": "minmax",
172
+ "num_calibration_samples": 100,
173
+ "calibration_histogram_percentile": null,
174
+ "calibration_moving_average": null,
175
+ "calibration_moving_average_constant": null
176
+ },
177
+ "framework": "onnxruntime",
178
+ "framework_args": {
179
+ "opset": 11,
180
+ "optimization_level": 1
181
+ },
182
+ "hardware": "Architecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nByte Order: Little Endian\nAddress sizes: 46 bits physical, 48 bits virtual\nCPU(s): 8\nOn-line CPU(s) list: 0-7\nThread(s) per core: 2\nCore(s) per socket: 4\nSocket(s): 1\nNUMA node(s): 1\nVendor ID: GenuineIntel\nCPU family: 6\nModel: 85\nModel name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz\nStepping: 7\nCPU MHz: 3100.091\nBogoMIPS: 4999.99\nHypervisor vendor: KVM\nVirtualization type: full\nL1d cache: 128 KiB\nL1i cache: 128 KiB\nL2 cache: 4 MiB\nL3 cache: 35.8 MiB\nNUMA node0 CPU(s): 0-7\nVulnerability Itlb multihit: KVM: Vulnerable\nVulnerability L1tf: Mitigation; PTE Inversion\nVulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown\nVulnerability Meltdown: Mitigation; PTI\nVulnerability Spec store bypass: Vulnerable\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling\nVulnerability Srbds: Not affected\nVulnerability Tsx async abort: Not affected\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni\n",
183
+ "versions": {
184
+ "transformers": "4.20.1",
185
+ "optimum": "1.2.3.dev0",
186
+ "optimum_hash": "5ac9c0d9fd7e7cca55b2f9935b961ed5b6c50112"
187
+ },
188
+ "evaluation": {
189
+ "time": [
190
+ {
191
+ "batch_size": 1,
192
+ "input_length": 64,
193
+ "baseline": {
194
+ "nb_forwards": 69,
195
+ "throughput": 23.0,
196
+ "latency_mean": 43.50449917391305,
197
+ "latency_std": 1.1458006326491226,
198
+ "latency_50": 43.443712,
199
+ "latency_90": 44.833304,
200
+ "latency_95": 45.4732784,
201
+ "latency_99": 46.1717674,
202
+ "latency_999": 46.293552340000005
203
+ },
204
+ "optimized": {
205
+ "nb_forwards": 80,
206
+ "throughput": 26.67,
207
+ "latency_mean": 37.9267952125,
208
+ "latency_std": 0.11734822683861629,
209
+ "latency_50": 37.9285515,
210
+ "latency_90": 38.085207600000004,
211
+ "latency_95": 38.111036399999996,
212
+ "latency_99": 38.2064807,
213
+ "latency_999": 38.22722057
214
+ }
215
+ },
216
+ {
217
+ "batch_size": 8,
218
+ "input_length": 64,
219
+ "baseline": {
220
+ "nb_forwards": 14,
221
+ "throughput": 4.67,
222
+ "latency_mean": 214.81155885714287,
223
+ "latency_std": 0.6229026122307055,
224
+ "latency_50": 214.6879675,
225
+ "latency_90": 215.571702,
226
+ "latency_95": 215.72494925,
227
+ "latency_99": 215.90999385,
228
+ "latency_999": 215.951628885
229
+ },
230
+ "optimized": {
231
+ "nb_forwards": 12,
232
+ "throughput": 4.0,
233
+ "latency_mean": 256.95122358333333,
234
+ "latency_std": 1.2773226309110695,
235
+ "latency_50": 257.0572985,
236
+ "latency_90": 258.7638351,
237
+ "latency_95": 258.84763195,
238
+ "latency_99": 258.86815838999996,
239
+ "latency_999": 258.872776839
240
+ }
241
+ },
242
+ {
243
+ "batch_size": 4,
244
+ "input_length": 64,
245
+ "baseline": {
246
+ "nb_forwards": 26,
247
+ "throughput": 8.67,
248
+ "latency_mean": 119.1024813076923,
249
+ "latency_std": 1.5917975126134987,
250
+ "latency_50": 118.759877,
251
+ "latency_90": 120.792844,
252
+ "latency_95": 121.9356475,
253
+ "latency_99": 123.13953675,
254
+ "latency_999": 123.40581367499999
255
+ },
256
+ "optimized": {
257
+ "nb_forwards": 23,
258
+ "throughput": 7.67,
259
+ "latency_mean": 130.78132304347827,
260
+ "latency_std": 0.5922745467393132,
261
+ "latency_50": 130.955147,
262
+ "latency_90": 131.512009,
263
+ "latency_95": 131.5393553,
264
+ "latency_99": 131.74930052,
265
+ "latency_999": 131.801985152
266
+ }
267
+ }
268
+ ],
269
+ "others": {
270
+ "baseline": {
271
+ "precision": 0.936836221352711,
272
+ "recall": 0.9533560864618885,
273
+ "f1": 0.9450239639131661,
274
+ "accuracy": 0.9880421708059153
275
+ },
276
+ "optimized": {
277
+ "precision": 0.06477812995245642,
278
+ "recall": 0.18600682593856654,
279
+ "f1": 0.09609168380840435,
280
+ "accuracy": 0.3400551899808958
281
+ }
282
+ }
283
+ },
284
+ "max_eval_samples": 1000,
285
+ "time_benchmark_args": {
286
+ "duration": 3,
287
+ "warmup_runs": 1
288
+ },
289
+ "model_type": "distilbert"
290
+ },
291
+ {
292
+ "model_name_or_path": "elastic/distilbert-base-uncased-finetuned-conll03-english",
293
+ "task": "token-classification",
294
+ "dataset": {
295
+ "path": "conll2003",
296
+ "eval_split": "validation",
297
+ "data_keys": {
298
+ "primary": "tokens",
299
+ "secondary": null
300
+ },
301
+ "ref_keys": [
302
+ "ner_tags"
303
+ ],
304
+ "name": null,
305
+ "calibration_split": "train"
306
+ },
307
+ "quantization_approach": "dynamic",
308
+ "operators_to_quantize": [
309
+ "Add",
310
+ "MatMul"
311
+ ],
312
+ "node_exclusion": [],
313
+ "aware_training": false,
314
+ "per_channel": false,
315
+ "calibration": {
316
+ "method": "minmax",
317
+ "num_calibration_samples": 100,
318
+ "calibration_histogram_percentile": null,
319
+ "calibration_moving_average": null,
320
+ "calibration_moving_average_constant": null
321
+ },
322
+ "framework": "onnxruntime",
323
+ "framework_args": {
324
+ "opset": 11,
325
+ "optimization_level": 1
326
+ },
327
+ "hardware": "Architecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nByte Order: Little Endian\nAddress sizes: 46 bits physical, 48 bits virtual\nCPU(s): 8\nOn-line CPU(s) list: 0-7\nThread(s) per core: 2\nCore(s) per socket: 4\nSocket(s): 1\nNUMA node(s): 1\nVendor ID: GenuineIntel\nCPU family: 6\nModel: 85\nModel name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz\nStepping: 7\nCPU MHz: 3100.009\nBogoMIPS: 4999.99\nHypervisor vendor: KVM\nVirtualization type: full\nL1d cache: 128 KiB\nL1i cache: 128 KiB\nL2 cache: 4 MiB\nL3 cache: 35.8 MiB\nNUMA node0 CPU(s): 0-7\nVulnerability Itlb multihit: KVM: Vulnerable\nVulnerability L1tf: Mitigation; PTE Inversion\nVulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown\nVulnerability Meltdown: Mitigation; PTI\nVulnerability Spec store bypass: Vulnerable\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling\nVulnerability Srbds: Not affected\nVulnerability Tsx async abort: Not affected\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni\n",
328
+ "versions": {
329
+ "transformers": "4.20.1",
330
+ "optimum": "1.2.3.dev0",
331
+ "optimum_hash": "5ac9c0d9fd7e7cca55b2f9935b961ed5b6c50112"
332
+ },
333
+ "evaluation": {
334
+ "time": [
335
+ {
336
+ "batch_size": 1,
337
+ "input_length": 64,
338
+ "baseline": {
339
+ "nb_forwards": 53,
340
+ "throughput": 17.67,
341
+ "latency_mean": 57.63860111320755,
342
+ "latency_std": 0.5448611043553628,
343
+ "latency_50": 57.65361,
344
+ "latency_90": 58.180421,
345
+ "latency_95": 58.392744,
346
+ "latency_99": 58.71634352,
347
+ "latency_999": 58.721444252
348
+ },
349
+ "optimized": {
350
+ "nb_forwards": 244,
351
+ "throughput": 81.33,
352
+ "latency_mean": 12.298368512295083,
353
+ "latency_std": 0.4560740565346141,
354
+ "latency_50": 12.2116125,
355
+ "latency_90": 13.001667200000002,
356
+ "latency_95": 13.1330103,
357
+ "latency_99": 13.2790208,
358
+ "latency_999": 13.414312331
359
+ }
360
+ },
361
+ {
362
+ "batch_size": 4,
363
+ "input_length": 64,
364
+ "baseline": {
365
+ "nb_forwards": 26,
366
+ "throughput": 8.67,
367
+ "latency_mean": 119.50429169230769,
368
+ "latency_std": 0.4639465722921096,
369
+ "latency_50": 119.446385,
370
+ "latency_90": 119.95197,
371
+ "latency_95": 120.05153425,
372
+ "latency_99": 120.7893855,
373
+ "latency_999": 121.00299195000001
374
+ },
375
+ "optimized": {
376
+ "nb_forwards": 76,
377
+ "throughput": 25.33,
378
+ "latency_mean": 39.91599960526316,
379
+ "latency_std": 0.883213781232674,
380
+ "latency_50": 39.8835755,
381
+ "latency_90": 41.0755615,
382
+ "latency_95": 41.48617225,
383
+ "latency_99": 42.00973875,
384
+ "latency_999": 42.412953375
385
+ }
386
+ },
387
+ {
388
+ "batch_size": 8,
389
+ "input_length": 64,
390
+ "baseline": {
391
+ "nb_forwards": 19,
392
+ "throughput": 6.33,
393
+ "latency_mean": 165.83700805263157,
394
+ "latency_std": 1.7394953701654086,
395
+ "latency_50": 165.801757,
396
+ "latency_90": 168.0285054,
397
+ "latency_95": 168.19460990000002,
398
+ "latency_99": 168.78632678,
399
+ "latency_999": 168.919463078
400
+ },
401
+ "optimized": {
402
+ "nb_forwards": 40,
403
+ "throughput": 13.33,
404
+ "latency_mean": 75.448955425,
405
+ "latency_std": 1.2544431966810392,
406
+ "latency_50": 75.414968,
407
+ "latency_90": 77.1854282,
408
+ "latency_95": 77.5299735,
409
+ "latency_99": 77.80073465000001,
410
+ "latency_999": 77.95147686499999
411
+ }
412
+ }
413
+ ],
414
+ "others": {
415
+ "baseline": {
416
+ "precision": 0.936836221352711,
417
+ "recall": 0.9533560864618885,
418
+ "f1": 0.9450239639131661,
419
+ "accuracy": 0.9880421708059153
420
+ },
421
+ "optimized": {
422
+ "precision": 0.9368008948545862,
423
+ "recall": 0.9527872582480091,
424
+ "f1": 0.9447264523406655,
425
+ "accuracy": 0.9879006580343876
426
+ }
427
+ }
428
+ },
429
+ "max_eval_samples": 1000,
430
+ "time_benchmark_args": {
431
+ "duration": 3,
432
+ "warmup_runs": 1
433
+ },
434
+ "model_type": "distilbert"
435
+ },
436
+ {
437
+ "model_name_or_path": "elastic/distilbert-base-uncased-finetuned-conll03-english",
438
+ "task": "token-classification",
439
+ "dataset": {
440
+ "path": "conll2003",
441
+ "eval_split": "validation",
442
+ "data_keys": {
443
+ "primary": "tokens",
444
+ "secondary": null
445
+ },
446
+ "ref_keys": [
447
+ "ner_tags"
448
+ ],
449
+ "name": null,
450
+ "calibration_split": "train"
451
+ },
452
+ "quantization_approach": "dynamic",
453
+ "operators_to_quantize": [
454
+ "Add"
455
+ ],
456
+ "node_exclusion": [],
457
+ "aware_training": false,
458
+ "per_channel": false,
459
+ "calibration": {
460
+ "method": "minmax",
461
+ "num_calibration_samples": 100,
462
+ "calibration_histogram_percentile": null,
463
+ "calibration_moving_average": null,
464
+ "calibration_moving_average_constant": null
465
+ },
466
+ "framework": "onnxruntime",
467
+ "framework_args": {
468
+ "opset": 11,
469
+ "optimization_level": 1
470
+ },
471
+ "hardware": "Architecture: x86_64\nCPU op-mode(s): 32-bit, 64-bit\nByte Order: Little Endian\nAddress sizes: 46 bits physical, 48 bits virtual\nCPU(s): 8\nOn-line CPU(s) list: 0-7\nThread(s) per core: 2\nCore(s) per socket: 4\nSocket(s): 1\nNUMA node(s): 1\nVendor ID: GenuineIntel\nCPU family: 6\nModel: 85\nModel name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz\nStepping: 7\nCPU MHz: 2638.487\nBogoMIPS: 4999.99\nHypervisor vendor: KVM\nVirtualization type: full\nL1d cache: 128 KiB\nL1i cache: 128 KiB\nL2 cache: 4 MiB\nL3 cache: 35.8 MiB\nNUMA node0 CPU(s): 0-7\nVulnerability Itlb multihit: KVM: Vulnerable\nVulnerability L1tf: Mitigation; PTE Inversion\nVulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown\nVulnerability Meltdown: Mitigation; PTI\nVulnerability Spec store bypass: Vulnerable\nVulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization\nVulnerability Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling\nVulnerability Srbds: Not affected\nVulnerability Tsx async abort: Not affected\nFlags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni\n",
472
+ "versions": {
473
+ "transformers": "4.20.1",
474
+ "optimum": "1.2.3.dev0",
475
+ "optimum_hash": "5ac9c0d9fd7e7cca55b2f9935b961ed5b6c50112"
476
+ },
477
+ "evaluation": {
478
+ "time": [
479
+ {
480
+ "batch_size": 1,
481
+ "input_length": 64,
482
+ "baseline": {
483
+ "nb_forwards": 69,
484
+ "throughput": 23.0,
485
+ "latency_mean": 43.50526027536232,
486
+ "latency_std": 1.1770353674252074,
487
+ "latency_50": 43.267983,
488
+ "latency_90": 45.0357992,
489
+ "latency_95": 45.6057136,
490
+ "latency_99": 46.708998679999986,
491
+ "latency_999": 47.814713768000004
492
+ },
493
+ "optimized": {
494
+ "nb_forwards": 102,
495
+ "throughput": 34.0,
496
+ "latency_mean": 29.424613480392157,
497
+ "latency_std": 0.14890697595200564,
498
+ "latency_50": 29.3912705,
499
+ "latency_90": 29.646715,
500
+ "latency_95": 29.68545545,
501
+ "latency_99": 29.80756655,
502
+ "latency_999": 29.811399894
503
+ }
504
+ },
505
+ {
506
+ "batch_size": 4,
507
+ "input_length": 64,
508
+ "baseline": {
509
+ "nb_forwards": 26,
510
+ "throughput": 8.67,
511
+ "latency_mean": 119.6179461923077,
512
+ "latency_std": 1.4057848288153165,
513
+ "latency_50": 119.394914,
514
+ "latency_90": 121.3817145,
515
+ "latency_95": 121.8577975,
516
+ "latency_99": 122.802906,
517
+ "latency_999": 123.0513933
518
+ },
519
+ "optimized": {
520
+ "nb_forwards": 28,
521
+ "throughput": 9.33,
522
+ "latency_mean": 107.42320235714286,
523
+ "latency_std": 0.9405205161982765,
524
+ "latency_50": 107.1847235,
525
+ "latency_90": 107.6445599,
526
+ "latency_95": 108.2160214,
527
+ "latency_99": 111.05779109000001,
528
+ "latency_999": 111.916852709
529
+ }
530
+ },
531
+ {
532
+ "batch_size": 8,
533
+ "input_length": 64,
534
+ "baseline": {
535
+ "nb_forwards": 14,
536
+ "throughput": 4.67,
537
+ "latency_mean": 214.6487932857143,
538
+ "latency_std": 0.9053003539723654,
539
+ "latency_50": 214.552057,
540
+ "latency_90": 215.54495519999998,
541
+ "latency_95": 216.14476715,
542
+ "latency_99": 216.93365343000002,
543
+ "latency_999": 217.11115284299999
544
+ },
545
+ "optimized": {
546
+ "nb_forwards": 15,
547
+ "throughput": 5.0,
548
+ "latency_mean": 211.41319233333334,
549
+ "latency_std": 1.1447515204122778,
550
+ "latency_50": 211.02957,
551
+ "latency_90": 213.090243,
552
+ "latency_95": 213.19109559999998,
553
+ "latency_99": 213.37423912,
554
+ "latency_999": 213.415446412
555
+ }
556
+ }
557
+ ],
558
+ "others": {
559
+ "baseline": {
560
+ "precision": 0.936836221352711,
561
+ "recall": 0.9533560864618885,
562
+ "f1": 0.9450239639131661,
563
+ "accuracy": 0.9880421708059153
564
+ },
565
+ "optimized": {
566
+ "precision": 0.936836221352711,
567
+ "recall": 0.9533560864618885,
568
+ "f1": 0.9450239639131661,
569
+ "accuracy": 0.9880421708059153
570
+ }
571
+ }
572
+ },
573
+ "max_eval_samples": 1000,
574
+ "time_benchmark_args": {
575
+ "duration": 3,
576
+ "warmup_runs": 1
577
+ },
578
+ "model_type": "distilbert"
579
+ }
580
+ ]
tensorboard/1657610437.7287223/events.out.tfevents.1657610437.ip-10-2-224-27.ec2.internal.1.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e1ceb68fac5c6b0f4db331c175a13093f199c2a166595387f5c6271dcfc8ff2
3
+ size 738
tensorboard/1657610437.7304575/events.out.tfevents.1657610437.ip-10-2-224-27.ec2.internal.1.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4819b83566afb7799a1297d0dd2e0518c6c74748e52cad94206c7528c26dbdd
3
+ size 728
tensorboard/1657610437.7316337/events.out.tfevents.1657610437.ip-10-2-224-27.ec2.internal.1.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9beb6fc624a7b8dec602eb0c8584fbc0a3a066b2942884d4310edab28ec0a1d0
3
+ size 737
tensorboard/1657610437.7327793/events.out.tfevents.1657610437.ip-10-2-224-27.ec2.internal.1.4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bf6c4db6b1b42996059c35e3e7ff7e6ecbbbde0b26b457b48b9119917cd7a5b
3
+ size 727
tensorboard/events.out.tfevents.1657610437.ip-10-2-224-27.ec2.internal.1.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ac35b2342711834d9070a406c05e7f888ba13de67ef840d1aa407e0d482f35c
3
+ size 40