--- tags: - pyannote - pyannote-audio - pyannote-audio-model - audio - voice - speech - voice-activity-detection - speech-to-noise ratio - snr - room acoustics - c50 datasets: - LibriSpeech - AudioSet - EchoThief - MIT-Acoustical-Reverberation-Scene license: openrail extra_gated_prompt: "The collected information will help acquire a better knowledge of this model userbase and help its maintainers apply for grants to improve it further. " extra_gated_fields: Company/university: text Website: text I plan to use this model for (task, type of audio data, etc): text --- # 🎙️🥁🚨🔊 Brouhaha ![Sample Brouhaha predictions](brouhaha.gif) **Joint voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation** [TL;DR](https://twitter.com/LavechinMarvin/status/1585645131251605504) | [Paper](https://arxiv.org/abs/2210.13248) | [Code](https://github.com/marianne-m/brouhaha-vad) | [And Now for Something Completely Different](https://www.youtube.com/watch?v=8ZyOAS22Moo) ## Installation This model relies on [pyannote.audio](https://github.com/pyannote/pyannote-audio) and [brouhaha-vad](https://github.com/marianne-m/brouhaha-vad). ```bash pip install pyannote-audio pip install https://github.com/marianne-m/brouhaha-vad/archive/main.zip ``` ## Usage ```python # 1. visit hf.co/pyannote/brouhaha and accept user conditions # 2. visit hf.co/settings/tokens to create an access token # 3. instantiate pretrained model from pyannote.audio import Model model = Model.from_pretrained("pyannote/brouhaha", use_auth_token="ACCESS_TOKEN_GOES_HERE") # apply model from pyannote.audio import Inference inference = Inference(model) output = inference("audio.wav") # iterate over each frame for frame, (vad, snr, c50) in output: t = frame.middle print(f"{t:8.3f} vad={100*vad:.0f}% snr={snr:.0f} c50={c50:.0f}") # ... # 12.952 vad=100% snr=51 c50=17 # 12.968 vad=100% snr=52 c50=17 # 12.985 vad=100% snr=53 c50=17 # ... ``` ## Citation ```bibtex @article{lavechin2022brouhaha, Title = {{Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation}}, Author = {Marvin Lavechin and Marianne Métais and Hadrien Titeux and Alodie Boissonnet and Jade Copet and Morgane Rivière and Elika Bergelson and Alejandrina Cristia and Emmanuel Dupoux and Hervé Bredin}, Year = {2022}, Journal = {arXiv preprint arXiv: Arxiv-2210.13248} } ```bibtex @inproceedings{Bredin2020, Title = {{pyannote.audio: neural building blocks for speaker diarization}}, Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe}, Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing}, Address = {Barcelona, Spain}, Month = {May}, Year = {2020}, } ```