Inference on fine-tuned whisper-large-v3 is not working, but is working on pre-trained model and whisper-medium
Hello,
I'm using this function for inference:
def eval(model_name, input_file):
test_dataset = Dataset.from_pandas(pd.read_excel(input_file).head(3))
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_name, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_name)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)
for row in test_dataset:
audio_path = row['Path']
result = pipe(audio_path, generate_kwargs={"language": "english"})
print(result["text"])
When I use:
- model_name ="openai/whisper-medium"
I get this output: "Okay, so what was your motivation to join the xxx study? Like, apart from what they told you that you should join?"
- model_name =".../fine_tune/wmau_none_P16/2_3200" # fine-tuned medium
I get this output: "okay so what was your motivation to join the athletes study like it apart from what they told you that you should join"
- model_name ="openai/whisper-large-v3"
I get this output: "Okay. So what was your motivation to join the xxx study? Like apart from what they told you that you should join?"
- model_name = ".../fine_tune/wl3au_none_P16/2_128000" # fine-tuned largev3
I get this output: "so what was your motivation to join the the the the the the"
This is the same trend I observe across all of my test set. Generated output for fine-tuned Whisper large v3 is too short, although the audio file is shorter than 30 seconds, the generated text is only a few words. The same code works fine for fine-tuned medium model, as well as both pre-trained models. Any thoughts why is this happening?