Having trouble deploying locally

#4
by 5aharsh - opened

The realtime transcription works great in spaces but I am having trouble getting it to run locally on my Ubuntu 22.04 machine.

Here are my issues:

  1. In the microphone block, it tells me that time_limit=45, stream_every=2 don't exist. the only workaround that I found is replacing this piece of code with every=2

  2. The transcription accuracy is extremely low, it is nothing like the accuracy displayed in the spaces app.

  3. Here are the errors/warnings that I get while running it locally:

  • /opt/conda/envs/transcriptor/lib/python3.9/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name inputs is deprecated. Please make sure to use input_features instead.

  • Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass language='en'.

  • Passing a tuple of past_key_values is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of EncoderDecoderCache instead, e.g. past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values).

  • The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.

  • You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset.

Here is a list of the packages installed in my Conda environment if that might have been the issue:

Owner

@5aharsh use gradio version 5.0.0b4

Sign up or log in to comment