anuragshas commited on
Commit
1dc58d5
1 Parent(s): 35e7476

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -17,12 +17,15 @@ model-index:
17
  name: Speech Recognition
18
  dataset:
19
  type: mozilla-foundation/common_voice_7_0
20
- name: Common Voice as
21
  args: as
22
  metrics:
23
  - type: wer # Required. Example: wer
24
- value: 67.679 # Required. Example: 20.90
25
  name: Test WER # Optional. Example: Test WER
 
 
 
26
  ---
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -82,7 +85,42 @@ The following hyperparameters were used during training:
82
 
83
  ### Framework versions
84
 
85
- - Transformers 4.15.0
86
  - Pytorch 1.10.0+cu111
87
  - Datasets 1.17.0
88
  - Tokenizers 0.10.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  name: Speech Recognition
18
  dataset:
19
  type: mozilla-foundation/common_voice_7_0
20
+ name: Common Voice 7
21
  args: as
22
  metrics:
23
  - type: wer # Required. Example: wer
24
+ value: 59.767 # Required. Example: 20.90
25
  name: Test WER # Optional. Example: Test WER
26
+ - name: Test CER
27
+ type: cer
28
+ value: 22.149
29
  ---
30
 
31
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
85
 
86
  ### Framework versions
87
 
88
+ - Transformers 4.16.0
89
  - Pytorch 1.10.0+cu111
90
  - Datasets 1.17.0
91
  - Tokenizers 0.10.3
92
+
93
+ #### Evaluation Commands
94
+ 1. To evaluate on `mozilla-foundation/common_voice_7_0` with split `test`
95
+
96
+ ```bash
97
+ python eval.py --model_id anuragshas/wav2vec2-large-xls-r-300m-as --dataset mozilla-foundation/common_voice_7_0 --config as --split test
98
+ ```
99
+
100
+
101
+ ### Inference With LM
102
+
103
+ ```python
104
+ import torch
105
+ from datasets import load_dataset
106
+ from transformers import AutoModelForCTC, AutoProcessor
107
+ import torchaudio.functional as F
108
+ model_id = "anuragshas/wav2vec2-large-xls-r-300m-as"
109
+ sample_iter = iter(load_dataset("mozilla-foundation/common_voice_7_0", "as", split="test", streaming=True, use_auth_token=True))
110
+ sample = next(sample_iter)
111
+ resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
112
+ model = AutoModelForCTC.from_pretrained(model_id)
113
+ processor = AutoProcessor.from_pretrained(model_id)
114
+ input_values = processor(resampled_audio, return_tensors="pt").input_values
115
+ with torch.no_grad():
116
+ logits = model(input_values).logits
117
+ transcription = processor.batch_decode(logits.numpy()).text
118
+ # => "জাহাজত তো তিশকুৰলৈ যাব কিন্তু জহাজিটো আহিপনে"
119
+ ```
120
+
121
+ ### Eval results on Common Voice 7 "test" (WER):
122
+
123
+ | Without LM | With LM (run `./eval.py`) |
124
+ |---|---|
125
+ | 67 | 59.767 |
126
+