facebook
/

mms-300m

Inference Endpoints

Model card Files Files and versions Community

mms-300m / README.md

wannaphong's picture

Change the license

ef858ef over 1 year ago

|

3.17 kB

	---
	tags:
	- mms
	language:
	- ab
	- af
	- ak
	- am
	- ar
	- as
	- av
	- ay
	- az
	- ba
	- bm
	- be
	- bn
	- bi
	- bo
	- sh
	- br
	- bg
	- ca
	- cs
	- ce
	- cv
	- ku
	- cy
	- da
	- de
	- dv
	- dz
	- el
	- en
	- eo
	- et
	- eu
	- ee
	- fo
	- fa
	- fj
	- fi
	- fr
	- fy
	- ff
	- ga
	- gl
	- gn
	- gu
	- zh
	- ht
	- ha
	- he
	- hi
	- sh
	- hu
	- hy
	- ig
	- ia
	- ms
	- is
	- it
	- jv
	- ja
	- kn
	- ka
	- kk
	- kr
	- km
	- ki
	- rw
	- ky
	- ko
	- kv
	- lo
	- la
	- lv
	- ln
	- lt
	- lb
	- lg
	- mh
	- ml
	- mr
	- ms
	- mk
	- mg
	- mt
	- mn
	- mi
	- my
	- zh
	- nl
	- 'no'
	- 'no'
	- ne
	- ny
	- oc
	- om
	- or
	- os
	- pa
	- pl
	- pt
	- ms
	- ps
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- qu
	- ro
	- rn
	- ru
	- sg
	- sk
	- sl
	- sm
	- sn
	- sd
	- so
	- es
	- sq
	- su
	- sv
	- sw
	- ta
	- tt
	- te
	- tg
	- tl
	- th
	- ti
	- ts
	- tr
	- uk
	- ms
	- vi
	- wo
	- xh
	- ms
	- yo
	- ms
	- zu
	- za
	license: cc-by-nc-4.0
	datasets:
	- google/fleurs
	metrics:
	- wer
	---

	# Massively Multilingual Speech (MMS) - 300m

	Facebook's MMS counting 300m parameters.

	MMS is Facebook AI's massive multilingual pretrained model for speech ("MMS").
	It is pretrained in with [Wav2Vec2's self-supervised training objective](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) on about 500,000 hours of speech data in over 1,400 languages.

	When using the model make sure that your speech input is sampled at 16kHz.

	Note: This model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Translation, or Classification. Check out the [How-to-fine section](#how-to-finetune) or [this blog**](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for more information about ASR.

	## Table Of Content

	- [How to Finetune](#how-to-finetune)
	- [Model details](#model-details)
	- [Additional links](#additional-links)

	## How to finetune

	Coming soon...

	## Model details

	- Developed by: Vineel Pratap et al.
	- Model type: Multi-Lingual Automatic Speech Recognition model
	- Language(s): 1000+ languages
	- License: CC-BY-NC 4.0 license
	- Num parameters: 300 million
	- Cite as:

	@article{pratap2023mms,
	title={Scaling Speech Technology to 1,000+ Languages},
	author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
	journal={arXiv},
	year={2023}
	}

	## Additional Links

	- [Blog post]( )
	- [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
	- [Paper](https://arxiv.org/abs/2305.13516)
	- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
	- [Other MMS checkpoints](https://huggingface.co/models?other=mms)
	- MMS ASR fine-tuned checkpoints:
	- [facebook/mms-1b-all](https://huggingface.co/facebook/mms-1b-all)
	- [facebook/mms-1b-l1107](https://huggingface.co/facebook/mms-1b-l1107)
	- [facebook/mms-1b-fl102](https://huggingface.co/facebook/mms-1b-fl102)
	- [Official Space](https://huggingface.co/spaces/facebook/MMS)