Prompt and Text Normalizer for Audio Transcription Word Error Rate Calculation
Hi,
Thank you for releasing the model!
I would like to calculate a word error rate (WER) on an English speech recognition dataset such as LibriSpeech. I'm curious if there is a recommended prompt and a Text Normalizer for this.
Thank you.
Hi @cwoolee ,
Thanks for reaching out to us, the instruction-tuned (IT) models follows a specified kind of prompt structure - role based prompts and uses a instruction-tuned prompt template. Please find the following sample prompt structure and to know more about prompt template, or more information please visit the model card page.
Sample prompt structure:
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."}]
},
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
}
]
Thanks.