Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This is facebook/wav2vec2-large-960h-lv60-self enhanced with a Wikipedia language model.

The dataset used is wikipedia/20200501.en. All articles were used. It was cleaned of references and external links and all text inside of parantheses. It has 8092546 words.

The language model was built using KenLM. It is a 5-gram model where all singletons of 3-grams and bigger were pruned. It was built as:

kenlm/build/bin/lmplz -o 5 -S 120G --vocab_estimate 8092546 --text text.txt --arpa text.arpa --prune 0 0 1

Suggested usage:

from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="gxbag/wav2vec2-large-960h-lv60-self-with-wikipedia-lm")
output = pipe("/path/to/audio.wav", chunk_length_s=30, stride_length_s=(6, 3))
output

Note that in the current version of transformers (as of the release of this model), when using striding in the pipeline it will chop off the last portion of audio, in this case 3 seconds. Add 3 seconds of silence to the end as a workaround. This problem was fixed in the GitHub version of transformers.

Downloads last month
22
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.