Only 30 second of my audio get transcripted

#112
by Ganz00 - opened

I'M USING THIS,

Fonction pour transcrire l'audio

def transcribe_whisper(audio_path):
# Charger l'audio
speech_array, sampling_rate = torchaudio.load(audio_path)

# Prétraiter les entrées audio
inputs = processor(speech_array.squeeze(), sampling_rate=sampling_rate, return_tensors="pt")

# Générer la transcription
with torch.no_grad():
    predicted_ids = model.generate(**inputs,max_length=4096)

# Décoder les ids en texte
transcription = processor.batch_decode(predicted_ids)[0]

return transcription

The audio is 90 second but only 30 second sometime 10 get transcripted

Hi, use the pipeline object as described on the model card. This will automatically split your long audio in 30 sec chunks and merge them afterwards.

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

Yes indeeed i did it and it worked.

thanks for the reply !!!!!

Ganz00 changed discussion status to closed

Sign up or log in to comment