task: token-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.m5.2xlarge', 'supported_instructions': 'avx512'}
Number of evaluation samples: All dataset

Fixed parameters:

model_name_or_path: elastic/distilbert-base-uncased-finetuned-conll03-english
dataset:
- path: conll2003
- eval_split: validation
- data_keys: {'primary': 'tokens'}
- ref_keys: ['ner_tags']
- calibration_split: train
node_exclusion: []
per_channel: False
calibration:
- method: minmax
- num_calibration_samples: 100
framework: onnxruntime
framework_args:
- opset: 11
- optimization_level: 1
aware_training: False

Benchmarked parameters:

quantization_approach: dynamic, static
operators_to_quantize: ['Add', 'MatMul'], ['Add']

Evaluation

Non-time metrics

quantization_approach	operators_to_quantize		precision (original)	precision (optimized)		recall (original)	recall (optimized)		f1 (original)	f1 (optimized)		accuracy (original)	accuracy (optimized)
`dynamic`	`['Add', 'MatMul']`	\|	0.936	0.935	\|	0.944	0.943	\|	0.940	0.939	\|	0.988	0.988
`dynamic`	`['Add']`	\|	0.936	0.936	\|	0.944	0.944	\|	0.940	0.940	\|	0.988	0.988
`static`	`['Add', 'MatMul']`	\|	0.936	0.063	\|	0.944	0.246	\|	0.940	0.100	\|	0.988	0.343
`static`	`['Add']`	\|	0.936	0.050	\|	0.944	0.160	\|	0.940	0.076	\|	0.988	0.311

Time metrics

Time benchmarks were run for 15 seconds per config.

Below, time metrics for batch size = 1, input length = 32.

quantization_approach	operators_to_quantize		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	\|	46.38	9.96	\|	21.60	100.47
`dynamic`	`['Add']`	\|	36.59	13.98	\|	27.33	71.60
`static`	`['Add', 'MatMul']`	\|	33.84	14.46	\|	29.60	69.20
`static`	`['Add']`	\|	33.23	20.11	\|	30.13	49.73

Below, time metrics for batch size = 1, input length = 64.

quantization_approach	operators_to_quantize		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	\|	58.92	19.68	\|	17.00	50.87
`dynamic`	`['Add']`	\|	58.59	24.81	\|	17.13	40.33
`static`	`['Add', 'MatMul']`	\|	51.41	29.36	\|	19.47	34.07
`static`	`['Add']`	\|	44.22	38.56	\|	22.67	25.93

Below, time metrics for batch size = 1, input length = 128.

quantization_approach	operators_to_quantize		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	\|	72.38	36.47	\|	13.87	27.47
`dynamic`	`['Add']`	\|	70.21	46.30	\|	14.27	21.60
`static`	`['Add', 'MatMul']`	\|	70.76	48.24	\|	14.13	20.80
`static`	`['Add']`	\|	72.47	71.10	\|	13.80	14.07

Below, time metrics for batch size = 4, input length = 32.

quantization_approach	operators_to_quantize		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	\|	69.76	38.50	\|	14.40	26.00
`dynamic`	`['Add']`	\|	56.02	51.32	\|	17.87	19.53
`static`	`['Add', 'MatMul']`	\|	55.05	46.80	\|	18.20	21.40
`static`	`['Add']`	\|	71.03	56.82	\|	14.13	17.67

Below, time metrics for batch size = 4, input length = 64.

quantization_approach	operators_to_quantize		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	\|	119.91	61.51	\|	8.40	16.27
`dynamic`	`['Add']`	\|	108.43	105.65	\|	9.27	9.47
`static`	`['Add', 'MatMul']`	\|	119.89	86.76	\|	8.40	11.53
`static`	`['Add']`	\|	96.99	102.03	\|	10.33	9.87

Below, time metrics for batch size = 4, input length = 128.

quantization_approach	operators_to_quantize		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	\|	219.78	123.71	\|	4.60	8.13
`dynamic`	`['Add']`	\|	220.13	187.21	\|	4.60	5.40
`static`	`['Add', 'MatMul']`	\|	186.39	176.99	\|	5.40	5.67
`static`	`['Add']`	\|	219.57	203.71	\|	4.60	4.93

Below, time metrics for batch size = 8, input length = 32.

quantization_approach	operators_to_quantize		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	\|	118.32	59.22	\|	8.47	16.93
`dynamic`	`['Add']`	\|	116.52	80.17	\|	8.60	12.53
`static`	`['Add', 'MatMul']`	\|	116.59	83.55	\|	8.60	12.00
`static`	`['Add']`	\|	115.81	126.53	\|	8.67	7.93

Below, time metrics for batch size = 8, input length = 64.

quantization_approach	operators_to_quantize		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	\|	172.71	117.89	\|	5.80	8.53
`dynamic`	`['Add']`	\|	166.05	156.99	\|	6.07	6.40
`static`	`['Add', 'MatMul']`	\|	215.00	148.93	\|	4.67	6.73
`static`	`['Add']`	\|	214.55	200.16	\|	4.67	5.00

Below, time metrics for batch size = 8, input length = 128.

quantization_approach	operators_to_quantize		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	\|	403.69	307.36	\|	2.53	3.27
`dynamic`	`['Add']`	\|	372.85	317.53	\|	2.73	3.20
`static`	`['Add', 'MatMul']`	\|	352.18	320.85	\|	2.87	3.13
`static`	`['Add']`	\|	403.55	410.17	\|	2.53	2.47

fxmarty
/

20220712-h08m02s04_example

Evaluation

Non-time metrics

Time metrics

Dataset used to train fxmarty/20220712-h08m02s04_example