txya900619 commited on
Commit
d9e956d
·
verified ·
1 Parent(s): 918a995

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -1,3 +1,59 @@
1
  ---
2
  license: mit
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - hak
5
+ pipeline_tag: automatic-speech-recognition
6
  ---
7
+ # Model Card for whisper-large-v3-taiwanese-hakka
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+ This model is a fine-tuned version of the Taiwanese Hakka [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3), which uses the ids of each dialect as prompts during training, to experiment whether the addition of prompts to the finetune of whisper when using multiple dialects will give better results.
11
+
12
+ ## Dialect and Id
13
+ - 四縣: htia_sixian
14
+ - 海陸: htia_hailu
15
+ - 大埔: htia_dapu
16
+ - 饒平: htia_raoping
17
+ - 詔安: htia_zhaoan
18
+ - 南四縣: htia_nansixian
19
+
20
+ ### Training process
21
+ The training of the model was performed with the following hyperparameters
22
+
23
+ - Batch size: 32
24
+ - Epochs: 3
25
+ - Warmup Steps: 50
26
+ - Total Steps: 42549
27
+ - Learning rate: 7e-5
28
+ - Data augmentation: No
29
+
30
+
31
+ ### How to use
32
+
33
+ ```python
34
+ import torch
35
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
36
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
37
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
38
+ model_id = "formospeech/whisper-large-v3-taiwanese-hakka"
39
+ dialect_id = "htia_sixian"
40
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
41
+ model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
42
+ )
43
+ model.to(device)
44
+ processor = AutoProcessor.from_pretrained(model_id)
45
+ pipe = pipeline(
46
+ "automatic-speech-recognition",
47
+ model=model,
48
+ tokenizer=processor.tokenizer,
49
+ feature_extractor=processor.feature_extractor,
50
+ max_new_tokens=128,
51
+ chunk_length_s=30,
52
+ batch_size=16,
53
+ torch_dtype=torch_dtype,
54
+ device=device,
55
+ )
56
+ generate_kwargs = {"language": "Chinese", "prompt_ids": torch.from_numpy(processor.get_prompt_ids(dialect_id)).to(device)}
57
+ transcription = pipe("path/to/my_audio.wav", generate_kwargs=generate_kwargs)
58
+ print(transcription.replace(f" {dialect_id}", ""))
59
+ ```