naver
/

mHuBERT-147-ASR-fr

Automatic Speech Recognition

Model card Files Files and versions

This is a small CTC-based Automatic Speech Recognition system for French.

This model is part of our SLU demo available here: https://huggingface.co/spaces/naver/French-SLU-DEMO-Interspeech2024

Please check our blog post available at: TBD

Training data: 123 hours (84,707 utterances)
Normalization: Whisper normalization

Table of Contents:

Performance

	dev WER	dev CER	test WER	test CER
speechMASSIVE	9.2	2.6	9.6	2.9
fleurs102	20.0	7.0	22.0	7.7
CommonVoice 17	16.0	4.9	19.0	6.5

Training Parameters

This is a mHuBERT-147 ASR fine-tuned model. The training parameters are available in config.json. We highlight the use of 0.3 for hubert.final_dropout, which we found to be very helpful in convergence. We also use fp32 training, as we found fp16 training to be unstable.

ASR Model Class

We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class. The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head. The code is available in CTC_model.py.

Running Inference

The run_inference.py file illustrates how to load the model for inference (load_asr_model), and how to produce transcription for a file (run_asr_inference). Please follow the requirements file to avoid incorrect model loading.

Here is a simple example of the inference loop. Please notice that the sampling rate must be 16,000Hz.

from inference_code.run_inference import load_asr_model, run_asr_inference

model, processor = load_asr_model()

prediction = run_inference(model, processor, your_audio_file)

Downloads last month: 4

Safetensors

Model size

96.3M params

Tensor type

F32

·

Model tree for naver/mHuBERT-147-ASR-fr

Base model

utter-project/mHuBERT-147

Finetuned

(9)

this model

Datasets used to train naver/mHuBERT-147-ASR-fr

Space using naver/mHuBERT-147-ASR-fr 1