vumichien
/

wav2vec2-large-xlsr-japanese

Automatic Speech Recognition

xlsr-fine-tuning-week

Inference Endpoints

Model card Files Files and versions Community

Vu Minh Chien commited on Apr 1, 2021

Commit

80deca2

•

1 Parent(s): 8768b23

update result

Files changed (1) hide show

README.md +8 -2

README.md CHANGED Viewed

@@ -23,7 +23,10 @@ model-index:
     metrics:
        - name: Test WER
          type: wer
-         value: 31.07
 ---
 # Wav2Vec2-Large-XLSR-53-Japanese
 Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Japanese using the [Common Voice](https://huggingface.co/datasets/common_voice) and Japanese speech corpus of Saruwatari-lab, University of Tokyo [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut).
@@ -31,6 +34,9 @@ When using this model, make sure that your speech input is sampled at 16kHz.
 ## Usage
 The model can be used directly (without a language model) as follows:
 ```python
 import torch
 import torchaudio
 import librosa
@@ -111,7 +117,7 @@ def evaluate(batch):
 result = test_dataset.map(evaluate, batched=True, batch_size=8)
 print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
 ```
-**Test Result**: 31.07%
 ## Training
 The Common Voice `train`, `validation` datasets and Japanese speech corpus `basic5000` datasets were used for training.
 The script used for training can be found [here](https://colab.research.google.com/drive/1ZTxoYzgOotUjcyoBf0m8gZj5Kcmu2yGU)

     metrics:
        - name: Test WER
          type: wer
+         value: 30.837
+       - name: Test CER
+         type: cer
+         value: 17.849
 ---
 # Wav2Vec2-Large-XLSR-53-Japanese
 Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Japanese using the [Common Voice](https://huggingface.co/datasets/common_voice) and Japanese speech corpus of Saruwatari-lab, University of Tokyo [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut).
 ## Usage
 The model can be used directly (without a language model) as follows:
 ```python
+!pip install mecab-python3
+!pip install unidic-lite
+!python -m unidic download
 import torch
 import torchaudio
 import librosa
 result = test_dataset.map(evaluate, batched=True, batch_size=8)
 print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
 ```
+**Test Result**: 30.837%
 ## Training
 The Common Voice `train`, `validation` datasets and Japanese speech corpus `basic5000` datasets were used for training.
 The script used for training can be found [here](https://colab.research.google.com/drive/1ZTxoYzgOotUjcyoBf0m8gZj5Kcmu2yGU)