Spaces:
Running
on
Zero
Having trouble deploying locally
The realtime transcription works great in spaces but I am having trouble getting it to run locally on my Ubuntu 22.04 machine.
Here are my issues:
In the microphone block, it tells me that
time_limit=45, stream_every=2
don't exist. the only workaround that I found is replacing this piece of code withevery=2
The transcription accuracy is extremely low, it is nothing like the accuracy displayed in the spaces app.
Here are the errors/warnings that I get while running it locally:
/opt/conda/envs/transcriptor/lib/python3.9/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name
inputs
is deprecated. Please make sure to useinput_features
instead.Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass
language='en'
.Passing a tuple of
past_key_values
is deprecated and will be removed in Transformers v4.43.0. You should pass an instance ofEncoderDecoderCache
instead, e.g.past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)
.The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's
attention_mask
to obtain reliable results.You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset.
Here is a list of the packages installed in my Conda environment if that might have been the issue: