You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Getting Started
About
Code
Features
Benchmarking
Acknowledgements
Cite this Model

Getting Started

Details on the model, it's performance, and more available on Arxiv.

Clone the model

The Reverb ASR model v1 is stored in this model repository.

Install inference requirements

See our inference code at https://github.com/revdotcom/reverb/tree/main/asr

About

Rev’s Reverb ASR was trained on 200,000 hours of English speech, all expertly transcribed by humans - the largest corpus of human transcribed audio ever used to train an open-source model. The quality of this data has produced the world’s most accurate English automatic speech recognition (ASR) system, using an efficient model architecture that can be run on either CPU or GPU. Additionally, Reverb ASR provides user control over the level of verbatimicity of the output transcript, making it ideal for both clean, readable transcription and use-cases like audio editing that require transcription of every spoken word including hesitations and re-wordings. Users can specify fully verbatim, fully non-verbatim, or anywhere in between for their transcription output.

Code

The folder wenet is taken a fork of the WeNet repository, with some modifications made for Rev-specific architecture.

The folder wer_evaluation contains instructions and code for running different benchmark utlities. These scripts are not specific to the Reverb architecture.

Features

Transcription Style Options

Reverb ASR was trained to produce transcriptions in either a verbatim style, in which every word is transcribed as spoken; or a non-verbatim style, in which disfluencies may be removed from the transcript.

Users can specify Reverb ASR's output style with the verbatimicity parameter. 1 corresponds to a verbatim transcript and 0 corresponds to a non-verbatim transcript. Values between 0 and 1 are accepted and may correspond to a semi-non-verbatim style. See our demo here to test the verbatimicity parameter with your own audio.

Decoding Options

Reverb ASR uses the joint CTC/attention architecture described here and here, and supports multiple modes of decoding. Users can specify one or more modes of decoding to recognize_wav.py and separate output directories will be created for each decoding mode.

Decoding options are:

attention
ctc_greedy_search
ctc_prefix_beam_search
attention_rescoring
joint_decoding

Usage

python wenet/bin/recognize_wav.py --config model.yaml \
    --checkpoint model.pt \
    --audio hello_world.wav \
    --modes ctc_prefix_beam_search attention_rescoring \
    --gpu 0 \
    --verbatimicity 1.0

Or check out our demo on HuggingFace.

Benchmarking

See wer_evaluation folder of https://github.com/revdotcom/reverb/tree/main/asr for details and results.

Cite this Model

If you use this model please use the following citation:

@misc{bhandari2024reverbopensourceasrdiarization,
      title={Reverb: Open-Source ASR and Diarization from Rev}, 
      author={Nishchal Bhandari and Danny Chen and Miguel Ángel del Río Fernández and Natalie Delworth and Jennifer Drexler Fox and Migüel Jetté and Quinten McNamara and Corey Miller and Ondřej Novotný and Ján Profant and Nan Qin and Martin Ratajczak and Jean-Philippe Robichaud},
      year={2024},
      eprint={2410.03930},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.03930}, 
}

Acknowledgments

Special thanks to the Wenet team for their work and for making it available under an open-source license.

License

See LICENSE for details.

Downloads last month: 19

Spaces using Revai/reverb-asr 4

Papers for Revai/reverb-asr

Reverb: Open-Source ASR and Diarization from Rev

Paper • 2410.03930 • Published Oct 4, 2024

WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit

Paper • 2102.01547 • Published Feb 2, 2021

Revai
/

reverb-asr