metadata
pipeline_tag: token-classification
datasets:
- conll2003
metrics:
- precision
- recall
- f1
- accuracy
tags:
- distilbert
task: token-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.g4dn.2xlarge', 'supported_instructions': 'avx512_vnni'}
Number of evaluation samples: 1000
Fixed parameters:
- model_name_or_path:
elastic/distilbert-base-uncased-finetuned-conll03-english
- dataset:
- path:
conll2003
- eval_split:
validation
- data_keys:
{'primary': 'tokens'}
- ref_keys:
['ner_tags']
- calibration_split:
train
- path:
- node_exclusion:
[]
- per_channel:
False
- calibration:
- method:
minmax
- num_calibration_samples:
100
- method:
- framework:
onnxruntime
- framework_args:
- opset:
11
- optimization_level:
1
- opset:
- aware_training:
False
Benchmarked parameters:
- quantization_approach:
dynamic
,static
- operators_to_quantize:
['Add', 'MatMul']
,['Add']
Evaluation
Non-time metrics
quantization_approach | operators_to_quantize | precision (original) | precision (optimized) | recall (original) | recall (optimized) | f1 (original) | f1 (optimized) | accuracy (original) | accuracy (optimized) | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
| | 0.937 | 0.937 | | | 0.953 | 0.953 | | | 0.945 | 0.945 | | | 0.988 | 0.988 |
dynamic |
['Add'] |
| | 0.937 | 0.937 | | | 0.953 | 0.953 | | | 0.945 | 0.945 | | | 0.988 | 0.988 |
static |
['Add', 'MatMul'] |
| | 0.937 | 0.074 | | | 0.953 | 0.253 | | | 0.945 | 0.114 | | | 0.988 | 0.363 |
static |
['Add'] |
| | 0.937 | 0.065 | | | 0.953 | 0.186 | | | 0.945 | 0.096 | | | 0.988 | 0.340 |
Time metrics
Time benchmarks were run for 3 seconds per config.
Below, time metrics for batch size = 1, input length = 64.
quantization_approach | operators_to_quantize | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
| | 57.64 | 12.30 | | | 17.67 | 81.33 |
dynamic |
['Add'] |
| | 43.51 | 29.42 | | | 23.00 | 34.00 |
static |
['Add', 'MatMul'] |
| | 43.05 | 21.11 | | | 23.33 | 47.67 |
static |
['Add'] |
| | 43.50 | 37.93 | | | 23.00 | 26.67 |
Below, time metrics for batch size = 4, input length = 64.
quantization_approach | operators_to_quantize | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
| | 119.50 | 39.92 | | | 8.67 | 25.33 |
dynamic |
['Add'] |
| | 119.62 | 107.42 | | | 8.67 | 9.33 |
static |
['Add', 'MatMul'] |
| | 120.23 | 56.94 | | | 8.33 | 17.67 |
static |
['Add'] |
| | 119.10 | 130.78 | | | 8.67 | 7.67 |
Below, time metrics for batch size = 8, input length = 64.
quantization_approach | operators_to_quantize | latency_mean (original, ms) | latency_mean (optimized, ms) | throughput (original, /s) | throughput (optimized, /s) | ||
---|---|---|---|---|---|---|---|
dynamic |
['Add', 'MatMul'] |
| | 165.84 | 75.45 | | | 6.33 | 13.33 |
dynamic |
['Add'] |
| | 214.65 | 211.41 | | | 4.67 | 5.00 |
static |
['Add', 'MatMul'] |
| | 166.53 | 129.00 | | | 6.33 | 8.00 |
static |
['Add'] |
| | 214.81 | 256.95 | | | 4.67 | 4.00 |