ivrit-ai/whisper-large-v2-tuned · Can't get the example working

Oct 26, 2024

Trying to use the most basic example but can't get it working:

from transformers import pipeline
from datasets import load_dataset

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
  "automatic-speech-recognition",
  model="ivrit-ai/whisper-large-v2-tuned",
  chunk_length_s=30,
  device=device,
)

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]

prediction = pipe(sample.copy(), batch_size=8)["text"]

# we can also return timestamps for the predictions
prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]```


Error \ Warning:

```Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.66it/s]
C:\Users\snire\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\models\whisper\generation_whisper.py:509: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
  warnings.warn(
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.```

benderrodriguez

ivrit.ai org Oct 27, 2024

Will check.
However, this is an old model; I suggest using https://huggingface.co/ivrit-ai/faster-whisper-v2-d4 which is our top-of-the-line model.

Mbellish

Dec 10, 2024

what are the best practices to transcribe a couple of hours audio using this model?
how to use it for that? split?
using some faster whisper 1.1.0 tweaks still wont help.
it does transcribe fast, however, the quality is far from desired result.
any detailed guide?

thanks

yanirmr

ivrit.ai org Dec 10, 2024

Hi @Mbellish ,
This is an old model, please see to model and the guide :
https://huggingface.co/ivrit-ai/faster-whisper-v2-d4