Automatic Speech Recognition
Transformers
Safetensors
whisper
audio
hf-asr-leaderboard
Inference Endpoints

Issues when processing the audio chunks

#2
by TamarBukris - opened

I keep on getting an error when using the processor to get the inputs:
input_features = padded_inputs.get("input_features").transpose(2, 0, 1)
ValueError: axes don't match array

This is the code I'm using:

import torchaudio
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

model_id = "ivrit-ai/whisper-large-v2-tuned"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

audio_file_path ="audio.wav"

speech_array, sampling_rate = torchaudio.load(audio_file_path)

if sampling_rate != processor.feature_extractor.sampling_rate:
resampler = torchaudio.transforms.Resample(orig_freq=sampling_rate,
new_freq=processor.feature_extractor.sampling_rate)
speech_array = resampler(speech_array)

try:
inputs = processor(speech_array, sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt", padding=True)
print("Input processed successfully. Structure:")
except Exception as e:
print("Error during processing:", e)

Has anyone encountered this?

I encountered this error too! Have you gotten a solution?

ivrit.ai org

Best way is to use faster-whisper.
I will update the documentation.

benderrodriguez changed discussion status to closed

Sign up or log in to comment