ivrit-ai/whisper-large-v2-tuned · Issues when processing the audio chunks

May 7

•

I keep on getting an error when using the processor to get the inputs:
input_features = padded_inputs.get("input_features").transpose(2, 0, 1)
ValueError: axes don't match array

This is the code I'm using:

import torchaudio
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

model_id = "ivrit-ai/whisper-large-v2-tuned"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

audio_file_path ="audio.wav"

speech_array, sampling_rate = torchaudio.load(audio_file_path)

if sampling_rate != processor.feature_extractor.sampling_rate:
resampler = torchaudio.transforms.Resample(orig_freq=sampling_rate,
new_freq=processor.feature_extractor.sampling_rate)
speech_array = resampler(speech_array)

try:
inputs = processor(speech_array, sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt", padding=True)
print("Input processed successfully. Structure:")
except Exception as e:
print("Error during processing:", e)

Has anyone encountered this?

zhihe2024

Aug 14

I encountered this error too! Have you gotten a solution?

benderrodriguez

ivrit.ai org Aug 28

Best way is to use faster-whisper.
I will update the documentation.

benderrodriguez changed discussion status to closed Aug 28