Vu Minh Chien
commited on
Commit
•
80deca2
1
Parent(s):
8768b23
update result
Browse files
README.md
CHANGED
@@ -23,7 +23,10 @@ model-index:
|
|
23 |
metrics:
|
24 |
- name: Test WER
|
25 |
type: wer
|
26 |
-
value:
|
|
|
|
|
|
|
27 |
---
|
28 |
# Wav2Vec2-Large-XLSR-53-Japanese
|
29 |
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Japanese using the [Common Voice](https://huggingface.co/datasets/common_voice) and Japanese speech corpus of Saruwatari-lab, University of Tokyo [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut).
|
@@ -31,6 +34,9 @@ When using this model, make sure that your speech input is sampled at 16kHz.
|
|
31 |
## Usage
|
32 |
The model can be used directly (without a language model) as follows:
|
33 |
```python
|
|
|
|
|
|
|
34 |
import torch
|
35 |
import torchaudio
|
36 |
import librosa
|
@@ -111,7 +117,7 @@ def evaluate(batch):
|
|
111 |
result = test_dataset.map(evaluate, batched=True, batch_size=8)
|
112 |
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
|
113 |
```
|
114 |
-
**Test Result**:
|
115 |
## Training
|
116 |
The Common Voice `train`, `validation` datasets and Japanese speech corpus `basic5000` datasets were used for training.
|
117 |
The script used for training can be found [here](https://colab.research.google.com/drive/1ZTxoYzgOotUjcyoBf0m8gZj5Kcmu2yGU)
|
|
|
23 |
metrics:
|
24 |
- name: Test WER
|
25 |
type: wer
|
26 |
+
value: 30.837
|
27 |
+
- name: Test CER
|
28 |
+
type: cer
|
29 |
+
value: 17.849
|
30 |
---
|
31 |
# Wav2Vec2-Large-XLSR-53-Japanese
|
32 |
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Japanese using the [Common Voice](https://huggingface.co/datasets/common_voice) and Japanese speech corpus of Saruwatari-lab, University of Tokyo [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut).
|
|
|
34 |
## Usage
|
35 |
The model can be used directly (without a language model) as follows:
|
36 |
```python
|
37 |
+
!pip install mecab-python3
|
38 |
+
!pip install unidic-lite
|
39 |
+
!python -m unidic download
|
40 |
import torch
|
41 |
import torchaudio
|
42 |
import librosa
|
|
|
117 |
result = test_dataset.map(evaluate, batched=True, batch_size=8)
|
118 |
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
|
119 |
```
|
120 |
+
**Test Result**: 30.837%
|
121 |
## Training
|
122 |
The Common Voice `train`, `validation` datasets and Japanese speech corpus `basic5000` datasets were used for training.
|
123 |
The script used for training can be found [here](https://colab.research.google.com/drive/1ZTxoYzgOotUjcyoBf0m8gZj5Kcmu2yGU)
|