Only 30 second of my audio get transcripted
#112
by
Ganz00
- opened
I'M USING THIS,
Fonction pour transcrire l'audio
def transcribe_whisper(audio_path):
# Charger l'audio
speech_array, sampling_rate = torchaudio.load(audio_path)
# Prétraiter les entrées audio
inputs = processor(speech_array.squeeze(), sampling_rate=sampling_rate, return_tensors="pt")
# Générer la transcription
with torch.no_grad():
predicted_ids = model.generate(**inputs,max_length=4096)
# Décoder les ids en texte
transcription = processor.batch_decode(predicted_ids)[0]
return transcription
The audio is 90 second but only 30 second sometime 10 get transcripted
Hi, use the pipeline object as described on the model card. This will automatically split your long audio in 30 sec chunks and merge them afterwards.
model_id = "openai/whisper-large-v3"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
Yes indeeed i did it and it worked.
thanks for the reply !!!!!
Ganz00
changed discussion status to
closed