Large v3 gets stuck

#5
by Ne0nd0g - opened

I used whisper-tiny.en.llamafile on audio file that is about 2 hours long and it was doing a good job of transcribing the audio to text. When I use whisper-large-v3.llamafile on the same file, the program seems to get stuck and repeats the exact same text at 26 minutes in until the end of the recording. In my case, that text was repeated for an hour and a half. The tiny version doesn't do that. I also noticed the large model doesn't show [BLANK AUDIO] but the tiny version does. I tried using the -m flag with weights from ggerganov/whisper.cpp for large v3 with no different results.

Running on Windows 11:

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0

I noticed that the tiny model doesn't get stuck in the same places that the large model does, but it does still get stuck. There's a big chunk of [BLANK AUDIO] when there is people actually talking.

I tried the medium model and it turns out each gets stuck at different points in time. Maybe this is an issue with Whisper and not with this whispherfile.

long audio is not well supported for whisper. suggest using ffmpeg to chunk the audio in 5min segments or less

Sign up or log in to comment