SakshiRathi77
/

whisper-hindi-kagglex

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

SakshiRathi77 commited on Oct 20, 2023

Commit

ce1a421

•

1 Parent(s): 42d081b

Update README.md

Files changed (1) hide show

README.md +87 -5

README.md CHANGED Viewed

@@ -1,11 +1,93 @@
 ---
 license: apache-2.0
 datasets:
-- mozilla-foundation/common_voice_13_0
 language:
-- hi
 metrics:
-- wer
-library_name: adapter-transformers
 pipeline_tag: automatic-speech-recognition
----

 ---
 license: apache-2.0
+base_model: openai/whisper-small
+tags:
+  - generated_from_trainer
 datasets:
+  - mozilla-foundation/common_voice_15_0
+  - mozilla-foundation/common_voice_13_0
 language:
+  - hi
 metrics:
+  - cer
+  - wer
+library_name: transformers
 pipeline_tag: automatic-speech-recognition
+model-index:
+  - name: whisper-small-hi-cv
+    results:
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice 15
+          type: mozilla-foundation/common_voice_15_0
+          args: hi
+        metrics:
+          - name: Test WER
+            type: wer
+            value: 13.9913
+          - name: Test CER
+            type: cer
+            value: 5.8844
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice 13
+          type: mozilla-foundation/common_voice_13_0
+          args: hi
+        metrics:
+          - name: Test WER
+            type: wer
+            value: 23.1361
+          - name: Test CER
+            type: cer
+            value: 10.4366
+---
+# whisper-small-hi-cv
+This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 15 dataset.
+It achieves the following results on the evaluation set:
+- Wer: 13.9913
+- Cer: 5.8844
+## Evaluation
+```python
+from datasets import load_dataset,load_metric,Audio
+from transformers import WhisperForConditionalGeneration, WhisperProcessor
+import torch
+import torchaudio
+test_dataset = load_dataset("mozilla-foundation/common_voice_13_0", "hi", split="test")
+wer = load_metric("wer")
+cer = load_metric("cer")
+processor = WhisperProcessor.from_pretrained("kingabzpro/whisper-small-hi-cv")
+model = WhisperForConditionalGeneration.from_pretrained("kingabzpro/whisper-small-hi-cv").to("cuda")
+test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=16000))
+def map_to_pred(batch):
+    audio = batch["audio"]
+    input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
+    batch["reference"] = processor.tokenizer._normalize(batch['sentence'])
+    with torch.no_grad():
+        predicted_ids = model.generate(input_features.to("cuda"))[0]
+    transcription = processor.decode(predicted_ids)
+    batch["prediction"] = processor.tokenizer._normalize(transcription)
+    return batch
+result = test_dataset.map(map_to_pred)
+print("WER: {:2f}".format(100 * wer.compute(predictions=result["prediction"], references=result["reference"])))
+print("CER: {:2f}".format(100 * cer.compute(predictions=result["prediction"], references=result["reference"])))
+```
+```bash
+WER: 23.1361
+CER:  10.4366
+```