Fonction pour transcrire l'audio

def transcribe_whisper(audio_path):
# Charger l'audio
speech_array, sampling_rate = torchaudio.load(audio_path)

# Prétraiter les entrées audio
inputs = processor(speech_array.squeeze(), sampling_rate=sampling_rate, return_tensors="pt")

# Générer la transcription
with torch.no_grad():
    predicted_ids = model.generate(**inputs,max_length=4096)

# Décoder les ids en texte
transcription = processor.batch_decode(predicted_ids)[0]

return transcription

The audio is 90 second but only 30 second sometime 10 get transcripted

timroethig

May 20

Hi, use the pipeline object as described on the model card. This will automatically split your long audio in 30 sec chunks and merge them afterwards.

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

Ganz00

Jul 5

Yes indeeed i did it and it worked.

thanks for the reply !!!!!

Ganz00 changed discussion status to closed Jul 5

openai
/

whisper-large-v3

Only 30 second of my audio get transcripted

Fonction pour transcrire l'audio