alvanlii
/

whisper-largev2-cantonese-peft-lora

Automatic Speech Recognition

Model card Files Files and versions Metrics Training metrics Community

alvanlii commited on Mar 8, 2023

Commit

34e29e3

•

1 Parent(s): 309b69d

Update model usage instructions

Files changed (1) hide show

README.md +21 -8

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ model-index:
     metrics:
     - name: Normalized CER
       type: cer
-      value: 10.11
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
@@ -31,6 +31,19 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the Common Voice 11.0 dataset. This is trained with PEFT LoRA+BNB INT8.
 ## Training and evaluation data
 For training, three datasets were used:
 - Common Voice 11 Canto Train Set
@@ -53,10 +66,10 @@ For training, three datasets were used:
 | Training Loss | Epoch | Step | Validation Loss | Normalized CER    |
 |:-------------:|:-----:|:----:|:---------------:|:------:|
-| 0.4610        | 0.55  | 2000 | 0.3106          | 13.08 |
-| 0.3441        | 1.11  | 4000 | 0.2875          | 11.79 |
-| 0.3466        | 1.66  | 6000 | 0.2820          | 11.44 |
-| 0.2539        | 2.22  | 8000 | 0.2777          | 10.59 |
-| 0.2312        | 2.77  | 10000 | 0.2822          | 10.60 |
-| 0.1639        | 3.32  | 12000 | 0.2859          | 10.17 |
-| 0.1569        | 3.88  | 14000 | 0.2866          | 10

     metrics:
     - name: Normalized CER
       type: cer
+      value: <TBA>
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the Common Voice 11.0 dataset. This is trained with PEFT LoRA+BNB INT8.
+To use the model, use the following code. It should be able to inference with less than 16GB VRAM.
+```
+from peft import PeftModel, PeftConfig
+from transformers import WhisperForConditionalGeneration, Seq2SeqTrainer
+peft_model_id = "alvanlii/whisper-largev2-cantonese-peft-lora"
+peft_config = PeftConfig.from_pretrained(peft_model_id)
+model = WhisperForConditionalGeneration.from_pretrained(
+    peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto"
+)
+model = PeftModel.from_pretrained(model, peft_model_id)
+```
 ## Training and evaluation data
 For training, three datasets were used:
 - Common Voice 11 Canto Train Set
 | Training Loss | Epoch | Step | Validation Loss | Normalized CER    |
 |:-------------:|:-----:|:----:|:---------------:|:------:|
+| <TBA>        | 0.55  | 2000 | <TBA>          | <TBA> |
+| <TBA>        | 1.11  | 4000 | <TBA>          | <TBA> |
+| <TBA>        | 1.66  | 6000 | <TBA>          | <TBA> |
+| <TBA>        | 2.22  | 8000 | <TBA>          | <TBA> |
+| <TBA>        | 2.77  | 10000 | <TBA>          | <TBA> |
+| <TBA>        | 3.32  | 12000 | <TBA>          | <TBA> |
+| <TBA>        | 3.88  | 14000 | <TBA>          | <TBA> |