Automatic Speech Recognition
Transformers
Safetensors
Welsh
English
wav2vec2
Inference Endpoints
Edit model card

wav2vec2-xlsr-53-ft-cy-en-withlm

An acoustic encoder model for Welsh and English speech recognition accompanied with a n-gram language model. The acoustic model is fine-tuned from facebook/wav2vec2-large-xlsr-53 using transcribed spontaneous speech from techiaith/banc-trawsgrifiadau-bangor (v24.01) and Welsh and English speech data derived from version 16.1 the Common Voice datasets techiaith/commonvoice_16_1_en_cy

The accompanying language model is a single KenLM n-gram model trained with a balanced collection of Welsh and English texts from OSCAR, thus avoiding language specific models and language detection during CTC decoding.

Usage

The wav2vec2-xlsr-53-ft-cy-en-withlm model can be used directly as follows:

import torch
import torchaudio
import librosa

from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM

processor = Wav2Vec2ProcessorWithLM.from_pretrained("techiaith/wav2vec2-xlsr-53-ft-cy-en-withlm")
model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-53-ft-cy-en-withlm")

audio, rate = librosa.load(<path/to/audio_file>, sr=16000)

inputs = processor(audio, sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
  tlogits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

print("Prediction: ", processor.batch_decode(tlogits.numpy(), beam_width=10).text[0].strip())

Usage with a pipeline is even simpler...

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="techiaith/wav2vec2-xlsr-53-ft-cy-en-withlm")

def transcribe(audio):
    return transcriber(audio)["text"]

transcribe(<path/or/url/to/any/audiofile>)
Downloads last month
30
Safetensors
Model size
315M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for techiaith/wav2vec2-xlsr-53-ft-cy-en-withlm

Finetuned
(206)
this model

Datasets used to train techiaith/wav2vec2-xlsr-53-ft-cy-en-withlm

Space using techiaith/wav2vec2-xlsr-53-ft-cy-en-withlm 1