formospeech
/

whisper-large-v3-formosan-iso-prompt

Automatic Speech Recognition

Model card Files Files and versions Community

whisper-large-v3-formosan-iso-prompt / README.md

txya900619's picture

Update README.md

7e9d8b9 verified 4 months ago

|

history blame contribute delete

1.79 kB

	---
	license: cc-by-4.0
	pipeline_tag: automatic-speech-recognition
	---
	# Model Card for whisper-large-v3-formosan-iso-prompt

	<!-- Provide a quick summary of what the model is/does. -->
	This model is a early fine-tuned version of the Taiwanese indigenous [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3), which uses the ids of each dialect as prompts during training.
	Note: we use indonesian as whisper language id

	## Dialect and Id
	- 阿美語: ami
	- 賽德克語: sdq
	- 太魯閣語: trv

	### Training process
	The training of the model was performed with the following hyperparameters

	- Batch size: 32
	- Epochs: 4
	- Warmup Steps: 1170
	- Total Steps: 11700
	- Learning rate: 7e-5
	- Data augmentation: No


	### How to use

	```python
	import torch
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
	model_id = "formospeech/whisper-large-v3-formosan-iso-prompt"
	dialect_id = "ami"
	model = AutoModelForSpeechSeq2Seq.from_pretrained(
	model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
	)
	model.to(device)
	processor = AutoProcessor.from_pretrained(model_id)
	pipe = pipeline(
	"automatic-speech-recognition",
	model=model,
	tokenizer=processor.tokenizer,
	feature_extractor=processor.feature_extractor,
	max_new_tokens=128,
	chunk_length_s=30,
	batch_size=16,
	torch_dtype=torch_dtype,
	device=device,
	)
	generate_kwargs = {"language": "id", "prompt_ids": torch.from_numpy(processor.get_prompt_ids(dialect_id)).to(device)}
	transcription = pipe("path/to/my_audio.wav", generate_kwargs=generate_kwargs)
	print(transcription.replace(f" {dialect_id}", ""))
	```