fxmarty commited on
Commit
a686969
1 Parent(s): 9bf5aac

add experience

Browse files
README.md ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: question-answering
3
+ datasets:
4
+ - squad
5
+ metrics:
6
+ - exact_match
7
+ - f1
8
+ tags:
9
+ - distilbert
10
+ ---
11
+
12
+ **task**: `question-answering`
13
+ **Backend:** `sagemaker-training`
14
+ **Backend args:** `{'instance_type': 'ml.g4dn.2xlarge', 'supported_instructions': None}`
15
+ **Number of evaluation samples:** `1000`
16
+
17
+ Fixed parameters:
18
+ * **model_name_or_path**: `distilbert-base-uncased-distilled-squad`
19
+ * **dataset**:
20
+ * **path**: `squad`
21
+ * **eval_split**: `validation`
22
+ * **data_keys**: `{'question': 'question', 'context': 'context'}`
23
+ * **ref_keys**: `['answers']`
24
+ * **calibration_split**: `train`
25
+ * **per_channel**: `False`
26
+ * **calibration**:
27
+ * **method**: `minmax`
28
+ * **num_calibration_samples**: `100`
29
+ * **framework**: `onnxruntime`
30
+ * **framework_args**:
31
+ * **opset**: `11`
32
+ * **optimization_level**: `1`
33
+ * **aware_training**: `False`
34
+
35
+ Benchmarked parameters:
36
+ * **quantization_approach**: `dynamic`, `static`
37
+ * **operators_to_quantize**: `['Add']`, `['Add', 'MatMul']`
38
+ * **node_exclusion**: `[]`, `['layernorm', 'gelu', 'residual', 'gather', 'softmax']`
39
+
40
+ # Evaluation
41
+ ## Non-time metrics
42
+ | quantization_approach | operators_to_quantize | node_exclusion | | exact_match (original) | exact_match (optimized) | | f1 (original) | f1 (optimized) |
43
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :--------------------: | :---------------------: | :-: | :-----------: | :------------: |
44
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 82.300 | 80.600 | \| | 87.232 | 86.097 |
45
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 82.300 | 80.600 | \| | 87.232 | 86.097 |
46
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 82.300 | 82.300 | \| | 87.232 | 87.232 |
47
+ | `dynamic` | `['Add']` | `[]` | \| | 82.300 | 82.300 | \| | 87.232 | 87.232 |
48
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 82.300 | 72.900 | \| | 87.232 | 79.964 |
49
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 82.300 | 54.500 | \| | 87.232 | 64.292 |
50
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 82.300 | 76.900 | \| | 87.232 | 83.014 |
51
+ | `static` | `['Add']` | `[]` | \| | 82.300 | 59.800 | \| | 87.232 | 69.217 |
52
+
53
+ ## Time metrics
54
+ Time benchmarks were run for 15 seconds per config.
55
+
56
+
57
+ Below, time metrics for batch size = 1, input length = 32.
58
+
59
+ | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
60
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
61
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 47.87 | 7.23 | \| | 20.93 | 138.40 |
62
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 48.10 | 7.14 | \| | 20.80 | 140.13 |
63
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 43.83 | 17.16 | \| | 22.87 | 58.33 |
64
+ | `dynamic` | `['Add']` | `[]` | \| | 34.13 | 17.02 | \| | 29.33 | 58.80 |
65
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 35.07 | 9.21 | \| | 28.53 | 108.53 |
66
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 48.27 | 11.62 | \| | 20.73 | 86.13 |
67
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 34.11 | 19.23 | \| | 29.33 | 52.00 |
68
+ | `static` | `['Add']` | `[]` | \| | 48.54 | 21.18 | \| | 20.67 | 47.27 |
69
+
70
+
71
+ Below, time metrics for batch size = 1, input length = 64.
72
+
73
+ | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
74
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
75
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 59.92 | 12.60 | \| | 16.73 | 79.40 |
76
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 59.64 | 13.25 | \| | 16.80 | 75.47 |
77
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 60.13 | 29.65 | \| | 16.67 | 33.73 |
78
+ | `dynamic` | `['Add']` | `[]` | \| | 59.62 | 29.51 | \| | 16.80 | 33.93 |
79
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 58.94 | 15.13 | \| | 17.00 | 66.13 |
80
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 60.49 | 18.62 | \| | 16.53 | 53.73 |
81
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 43.32 | 28.00 | \| | 23.13 | 35.73 |
82
+ | `static` | `['Add']` | `[]` | \| | 44.19 | 32.72 | \| | 22.67 | 30.60 |
83
+
84
+
85
+ Below, time metrics for batch size = 1, input length = 128.
86
+
87
+ | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
88
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
89
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 73.39 | 26.56 | \| | 13.67 | 37.67 |
90
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 57.64 | 23.42 | \| | 17.40 | 42.73 |
91
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 64.04 | 50.14 | \| | 15.67 | 20.00 |
92
+ | `dynamic` | `['Add']` | `[]` | \| | 72.81 | 57.05 | \| | 13.80 | 17.53 |
93
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 70.57 | 27.59 | \| | 14.20 | 36.27 |
94
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 71.04 | 37.94 | \| | 14.13 | 26.40 |
95
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 57.65 | 57.95 | \| | 17.40 | 17.27 |
96
+ | `static` | `['Add']` | `[]` | \| | 71.66 | 58.67 | \| | 14.00 | 17.07 |
97
+
98
+
99
+ Below, time metrics for batch size = 4, input length = 32.
100
+
101
+ | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
102
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
103
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 72.11 | 21.80 | \| | 13.93 | 45.93 |
104
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 73.15 | 20.70 | \| | 13.73 | 48.33 |
105
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 72.05 | 53.68 | \| | 13.93 | 18.67 |
106
+ | `dynamic` | `['Add']` | `[]` | \| | 55.97 | 53.60 | \| | 17.87 | 18.67 |
107
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 70.46 | 24.88 | \| | 14.20 | 40.20 |
108
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 56.57 | 30.90 | \| | 17.73 | 32.40 |
109
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 62.38 | 53.64 | \| | 16.07 | 18.67 |
110
+ | `static` | `['Add']` | `[]` | \| | 60.19 | 67.29 | \| | 16.67 | 14.87 |
111
+
112
+
113
+ Below, time metrics for batch size = 4, input length = 64.
114
+
115
+ | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
116
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
117
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 121.20 | 40.12 | \| | 8.27 | 24.93 |
118
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 90.97 | 41.51 | \| | 11.00 | 24.13 |
119
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 120.85 | 106.50 | \| | 8.33 | 9.40 |
120
+ | `dynamic` | `['Add']` | `[]` | \| | 118.58 | 106.55 | \| | 8.47 | 9.40 |
121
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 120.57 | 54.25 | \| | 8.33 | 18.47 |
122
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 104.93 | 57.90 | \| | 9.60 | 17.33 |
123
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 90.85 | 110.46 | \| | 11.07 | 9.07 |
124
+ | `static` | `['Add']` | `[]` | \| | 120.57 | 103.62 | \| | 8.33 | 9.67 |
125
+
126
+
127
+ Below, time metrics for batch size = 4, input length = 128.
128
+
129
+ | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
130
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
131
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 172.14 | 94.78 | \| | 5.87 | 10.60 |
132
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 220.38 | 84.18 | \| | 4.60 | 11.93 |
133
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 221.22 | 221.37 | \| | 4.53 | 4.53 |
134
+ | `dynamic` | `['Add']` | `[]` | \| | 203.90 | 175.16 | \| | 4.93 | 5.73 |
135
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 192.63 | 113.82 | \| | 5.20 | 8.80 |
136
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 220.32 | 122.36 | \| | 4.60 | 8.20 |
137
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 220.58 | 207.51 | \| | 4.60 | 4.87 |
138
+ | `static` | `['Add']` | `[]` | \| | 221.94 | 246.87 | \| | 4.53 | 4.07 |
139
+
140
+
141
+ Below, time metrics for batch size = 8, input length = 32.
142
+
143
+ | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
144
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
145
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 112.67 | 43.26 | \| | 8.93 | 23.13 |
146
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 95.78 | 40.66 | \| | 10.47 | 24.60 |
147
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 117.38 | 104.28 | \| | 8.53 | 9.60 |
148
+ | `dynamic` | `['Add']` | `[]` | \| | 89.81 | 91.00 | \| | 11.20 | 11.00 |
149
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 89.14 | 52.09 | \| | 11.27 | 19.20 |
150
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 92.77 | 64.21 | \| | 10.80 | 15.60 |
151
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 119.10 | 114.43 | \| | 8.40 | 8.80 |
152
+ | `static` | `['Add']` | `[]` | \| | 119.28 | 127.79 | \| | 8.40 | 7.87 |
153
+
154
+
155
+ Below, time metrics for batch size = 8, input length = 64.
156
+
157
+ | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
158
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
159
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 215.03 | 78.03 | \| | 4.67 | 12.87 |
160
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 214.76 | 87.19 | \| | 4.67 | 11.53 |
161
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 216.48 | 162.64 | \| | 4.67 | 6.20 |
162
+ | `dynamic` | `['Add']` | `[]` | \| | 204.29 | 212.33 | \| | 4.93 | 4.73 |
163
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 215.47 | 104.45 | \| | 4.67 | 9.60 |
164
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 209.66 | 106.43 | \| | 4.80 | 9.40 |
165
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 166.13 | 220.92 | \| | 6.07 | 4.53 |
166
+ | `static` | `['Add']` | `[]` | \| | 214.69 | 209.01 | \| | 4.67 | 4.80 |
167
+
168
+
169
+ Below, time metrics for batch size = 8, input length = 128.
170
+
171
+ | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) |
172
+ | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: |
173
+ | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 407.90 | 151.49 | \| | 2.47 | 6.67 |
174
+ | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 407.34 | 154.55 | \| | 2.47 | 6.53 |
175
+ | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 406.51 | 394.85 | \| | 2.47 | 2.60 |
176
+ | `dynamic` | `['Add']` | `[]` | \| | 309.53 | 445.24 | \| | 3.27 | 2.27 |
177
+ | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 407.54 | 224.46 | \| | 2.47 | 4.47 |
178
+ | `static` | `['Add', 'MatMul']` | `[]` | \| | 408.14 | 236.94 | \| | 2.47 | 4.27 |
179
+ | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 309.91 | 357.87 | \| | 3.27 | 2.80 |
180
+ | `static` | `['Add']` | `[]` | \| | 310.00 | 406.54 | \| | 3.27 | 2.47 |
181
+
runs.json ADDED
The diff for this file is too large to render. See raw diff
 
tensorboard/1657701962.2788515/events.out.tfevents.1657701962.ip-10-0-211-116.ec2.internal.1.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1123cab78cf16d88e67d41db2ae06341c107ea5fcd81181fa403b8e75eae33f1
3
+ size 836
tensorboard/1657701962.2803338/events.out.tfevents.1657701962.ip-10-0-211-116.ec2.internal.1.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f02a12ae65f6c8fae03ff7274bc581dcc37bb19ec326b018eaf3d413a2573e3
3
+ size 784
tensorboard/1657701962.281493/events.out.tfevents.1657701962.ip-10-0-211-116.ec2.internal.1.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b654c0d29843719c1edcb5a7adf51451a0cdca7ef771ab10f7f1e4babf8c004b
3
+ size 826
tensorboard/1657701962.282597/events.out.tfevents.1657701962.ip-10-0-211-116.ec2.internal.1.4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b85b76c5a3ac508fc93c9a586fb403f9bc656d6cb2d644553a580cd38960ff0
3
+ size 774
tensorboard/1657701962.2841494/events.out.tfevents.1657701962.ip-10-0-211-116.ec2.internal.1.5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3893e01a6696ebcf71dae9705148bcebdfc7bba9340c76867378d760bcfef4c8
3
+ size 835
tensorboard/1657701962.2852757/events.out.tfevents.1657701962.ip-10-0-211-116.ec2.internal.1.6 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee7d1b7ed59eac843570415dbcb41487d35e16c222d930e85af2a2580c5b8a0c
3
+ size 783
tensorboard/1657701962.2864373/events.out.tfevents.1657701962.ip-10-0-211-116.ec2.internal.1.7 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41aba3541919b1e499aca7c53d3868498ffbab7b91e32253625c60f2ea312988
3
+ size 825
tensorboard/1657701962.2878876/events.out.tfevents.1657701962.ip-10-0-211-116.ec2.internal.1.8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:745f4f21f17290f46a43593afc586e4701f2425a586b75bfcbb605f1129221d7
3
+ size 773
tensorboard/events.out.tfevents.1657701962.ip-10-0-211-116.ec2.internal.1.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db4aa35c2116489e4c7370f6c7df7aac3871eb1104127f8256d2de26d75acc10
3
+ size 40