sandy1990418
/

whisper-large-v3-turbo-chinese

Automatic Speech Recognition

Model card Files Files and versions Community

sandy1990418 commited on Oct 15

Commit

2243492

•

1 Parent(s): ccb3a7f

doc: add some description

Files changed (1) hide show

README.md +49 -1

README.md CHANGED Viewed

@@ -6,4 +6,52 @@ language:
 base_model:
 - openai/whisper-large-v3-turbo
 pipeline_tag: automatic-speech-recognition
----

 base_model:
 - openai/whisper-large-v3-turbo
 pipeline_tag: automatic-speech-recognition
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+This model card describes a fine-tuned version of the Whisper-large-v3-turbo model, optimized for Mandarin automatic speech recognition (ASR). The model was fine-tuned on the Common Voice 13.0 dataset using PEFT with LoRA to ensure efficient training while maintaining the performance of the original model.. It achieves the following results on the evaluation set (Common Voice 13.0 dataset / test):
+Wer without fine-tune: 77.08
+Wer after fine-tune: 44.93
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+```bash
+import torch
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
+from datasets import load_dataset
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+model_id = "sandy1990418/whisper-large-v3-turbo-chinese"
+model = AutoModelForSpeechSeq2Seq.from_pretrained(
+    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
+)
+model.to(device)
+processor = AutoProcessor.from_pretrained(model_id)
+pipe = pipeline(
+    "automatic-speech-recognition",
+    model=model,
+    tokenizer=processor.tokenizer,
+    feature_extractor=processor.feature_extractor,
+    torch_dtype=torch_dtype,
+    device=device,
+)
+dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
+sample = dataset[0]["audio"]
+result = pipe(sample)
+print(result["text"])
+```