sandy1990418
commited on
Commit
•
2243492
1
Parent(s):
ccb3a7f
doc: add some description
Browse files
README.md
CHANGED
@@ -6,4 +6,52 @@ language:
|
|
6 |
base_model:
|
7 |
- openai/whisper-large-v3-turbo
|
8 |
pipeline_tag: automatic-speech-recognition
|
9 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
base_model:
|
7 |
- openai/whisper-large-v3-turbo
|
8 |
pipeline_tag: automatic-speech-recognition
|
9 |
+
---
|
10 |
+
|
11 |
+
# Model Card for Model ID
|
12 |
+
|
13 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
14 |
+
|
15 |
+
This model card describes a fine-tuned version of the Whisper-large-v3-turbo model, optimized for Mandarin automatic speech recognition (ASR). The model was fine-tuned on the Common Voice 13.0 dataset using PEFT with LoRA to ensure efficient training while maintaining the performance of the original model.. It achieves the following results on the evaluation set (Common Voice 13.0 dataset / test):
|
16 |
+
|
17 |
+
Wer without fine-tune: 77.08
|
18 |
+
Wer after fine-tune: 44.93
|
19 |
+
|
20 |
+
|
21 |
+
## Uses
|
22 |
+
|
23 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
24 |
+
```bash
|
25 |
+
import torch
|
26 |
+
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
|
27 |
+
from datasets import load_dataset
|
28 |
+
|
29 |
+
|
30 |
+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
|
31 |
+
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
|
32 |
+
|
33 |
+
model_id = "sandy1990418/whisper-large-v3-turbo-chinese"
|
34 |
+
|
35 |
+
model = AutoModelForSpeechSeq2Seq.from_pretrained(
|
36 |
+
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
|
37 |
+
)
|
38 |
+
model.to(device)
|
39 |
+
|
40 |
+
processor = AutoProcessor.from_pretrained(model_id)
|
41 |
+
|
42 |
+
pipe = pipeline(
|
43 |
+
"automatic-speech-recognition",
|
44 |
+
model=model,
|
45 |
+
tokenizer=processor.tokenizer,
|
46 |
+
feature_extractor=processor.feature_extractor,
|
47 |
+
torch_dtype=torch_dtype,
|
48 |
+
device=device,
|
49 |
+
)
|
50 |
+
|
51 |
+
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
|
52 |
+
sample = dataset[0]["audio"]
|
53 |
+
|
54 |
+
result = pipe(sample)
|
55 |
+
print(result["text"])
|
56 |
+
|
57 |
+
```
|