Issues when processing the audio chunks
I keep on getting an error when using the processor to get the inputs:
input_features = padded_inputs.get("input_features").transpose(2, 0, 1)
ValueError: axes don't match array
This is the code I'm using:
import torchaudio
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
model_id = "ivrit-ai/whisper-large-v2-tuned"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)
audio_file_path ="audio.wav"
speech_array, sampling_rate = torchaudio.load(audio_file_path)
if sampling_rate != processor.feature_extractor.sampling_rate:
resampler = torchaudio.transforms.Resample(orig_freq=sampling_rate,
new_freq=processor.feature_extractor.sampling_rate)
speech_array = resampler(speech_array)
try:
inputs = processor(speech_array, sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt", padding=True)
print("Input processed successfully. Structure:")
except Exception as e:
print("Error during processing:", e)
Has anyone encountered this?
I encountered this error too! Have you gotten a solution?
Best way is to use faster-whisper.
I will update the documentation.