|
--- |
|
title: CER |
|
emoji: 🤗🏃🤗🏃🤗🏃🤗🏃🤗 |
|
colorFrom: blue |
|
colorTo: red |
|
sdk: gradio |
|
sdk_version: 3.19.1 |
|
app_file: app.py |
|
pinned: false |
|
tags: |
|
- evaluate |
|
- metric |
|
license: apache-2.0 |
|
--- |
|
--- |
|
|
|
description: >- |
|
Character error rate (CER) is a common metric of the performance of an automatic speech recognition system. |
|
|
|
CER is similar to Word Error Rate (WER), but operates on character instead of word. Please refer to docs of WER for further information. |
|
|
|
Character error rate can be computed as: |
|
|
|
CER = (S + D + I) / N = (S + D + I) / (S + D + C) |
|
|
|
where |
|
|
|
S is the number of substitutions, |
|
D is the number of deletions, |
|
I is the number of insertions, |
|
C is the number of correct characters, |
|
N is the number of characters in the reference (N=S+D+C). |
|
|
|
CER's output is not always a number between 0 and 1, in particular when there is a high number of insertions. This value is often associated to the percentage of characters that were incorrectly predicted. The lower the value, the better the |
|
performance of the ASR system with a CER of 0 being a perfect score. |
|
--- |
|
|
|
# Metric Card for CER |
|
|
|
## Metric description |
|
|
|
Character error rate (CER) is a common metric of the performance of an automatic speech recognition (ASR) system. CER is similar to Word Error Rate (WER), but operates on character instead of word. |
|
|
|
Character error rate can be computed as: |
|
|
|
`CER = (S + D + I) / N = (S + D + I) / (S + D + C)` |
|
|
|
where |
|
|
|
`S` is the number of substitutions, |
|
|
|
`D` is the number of deletions, |
|
|
|
`I` is the number of insertions, |
|
|
|
`C` is the number of correct characters, |
|
|
|
`N` is the number of characters in the reference (`N=S+D+C`). |
|
|
|
|
|
## How to use |
|
|
|
The metric takes two inputs: references (a list of references for each speech input) and predictions (a list of transcriptions to score). |
|
|
|
```python |
|
from evaluate import load |
|
cer = load("cer") |
|
cer_score = cer.compute(predictions=predictions, references=references) |
|
``` |
|
## Output values |
|
|
|
This metric outputs a float representing the character error rate. |
|
|
|
``` |
|
print(cer_score) |
|
0.34146341463414637 |
|
``` |
|
|
|
The **lower** the CER value, the **better** the performance of the ASR system, with a CER of 0 being a perfect score. |
|
|
|
However, CER's output is not always a number between 0 and 1, in particular when there is a high number of insertions (see [Examples](#Examples) below). |
|
|
|
### Values from popular papers |
|
|
|
## Examples |
|
|
|
Perfect match between prediction and reference: |
|
|
|
```python |
|
!pip install evaluate jiwer |
|
|
|
from evaluate import load |
|
cer = load("cer") |
|
predictions = ["hello világ", "jó éjszakát hold"] |
|
references = ["hello világ", "jó éjszakát hold"] |
|
cer_score = cer.compute(predictions=predictions, references=references) |
|
print(cer_score) |
|
0.0 |
|
``` |
|
Partial match between prediction and reference: |
|
|
|
```python |
|
from evaluate import load |
|
cer = load("cer") |
|
predictions = ["ez a jóslat", "van egy másik minta is"] |
|
references = ["ez a hivatkozás", "van még egy"] |
|
cer = evaluate.load("cer") |
|
cer_score = cer.compute(predictions=predictions, references=references) |
|
print(cer_score) |
|
0.9615384615384616 |
|
``` |
|
|
|
No match between prediction and reference: |
|
|
|
```python |
|
from evaluate import load |
|
cer = load("cer") |
|
predictions = ["üdvözlet"] |
|
references = ["jó!"] |
|
cer_score = cer.compute(predictions=predictions, references=references) |
|
print(cer_score) |
|
1.5 |
|
``` |
|
|
|
CER above 1 due to insertion errors: |
|
|
|
```python |
|
from evaluate import load |
|
cer = load("cer") |
|
predictions = ["Helló Világ"] |
|
references = ["Helló"] |
|
cer_score = cer.compute(predictions=predictions, references=references) |
|
print(cer_score) |
|
1.2 |
|
``` |
|
|
|
## Limitations and bias |
|
|
|
. |
|
|
|
Also, in some cases, instead of reporting the raw CER, a normalized CER is reported where the number of mistakes is divided by the sum of the number of edit operations (`I` + `S` + `D`) and `C` (the number of correct characters), which results in CER values that fall within the range of 0–100%. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
@inproceedings{morris2004, |
|
author = {Morris, Andrew and Maier, Viktoria and Green, Phil}, |
|
year = {2004}, |
|
month = {01}, |
|
pages = {}, |
|
title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.} |
|
} |
|
``` |
|
|
|
## References |
|
|
|
- [Hugging Face Tasks -- Automatic Speech Recognition](https://huggingface.co/tasks/automatic-speech-recognition) |
|
- https://github.com/huggingface/evaluate |
|
|
|
|