mms-lid-1024 / README.md
patrickvonplaten's picture
upload readme
37befd8
---
tags:
- mms
language:
- ab
- af
- ak
- am
- ar
- as
- av
- ay
- az
- ba
- bm
- be
- bn
- bi
- bo
- sh
- br
- bg
- ca
- cs
- ce
- cv
- ku
- cy
- da
- de
- dv
- dz
- el
- en
- eo
- et
- eu
- ee
- fo
- fa
- fj
- fi
- fr
- fy
- ff
- ga
- gl
- gn
- gu
- zh
- ht
- ha
- he
- hi
- sh
- hu
- hy
- ig
- ia
- ms
- is
- it
- jv
- ja
- kn
- ka
- kk
- kr
- km
- ki
- rw
- ky
- ko
- kv
- lo
- la
- lv
- ln
- lt
- lb
- lg
- mh
- ml
- mr
- ms
- mk
- mg
- mt
- mn
- mi
- my
- zh
- nl
- 'no'
- 'no'
- ne
- ny
- oc
- om
- or
- os
- pa
- pl
- pt
- ms
- ps
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- qu
- ro
- rn
- ru
- sg
- sk
- sl
- sm
- sn
- sd
- so
- es
- sq
- su
- sv
- sw
- ta
- tt
- te
- tg
- tl
- th
- ti
- ts
- tr
- uk
- ms
- vi
- wo
- xh
- ms
- yo
- ms
- zu
- za
license: cc-by-nc-4.0
datasets:
- google/fleurs
metrics:
- acc
---
# Massively Multilingual Speech (MMS) - Finetuned LID
This checkpoint is a model fine-tuned for speech language identification (LID) and part of Facebook's [Massive Multilingual Speech project](https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/).
This checkpoint is based on the [Wav2Vec2 architecture](https://huggingface.co/docs/transformers/model_doc/wav2vec2) and classifies raw audio input to a probability distribution over 1024 output classes (each class representing a language).
The checkpoint consists of **1 billion parameters** and has been fine-tuned from [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) on 1024 languages.
## Table Of Content
- [Example](#example)
- [Supported Languages](#supported-languages)
- [Model details](#model-details)
- [Additional links](#additional-links)
## Example
This MMS checkpoint can be used with [Transformers](https://github.com/huggingface/transformers) to identify
the spoken language of an audio. It can recognize the [following 1024 languages](#supported-languages).
Let's look at a simple example.
First, we install transformers and some other libraries
```
pip install torch accelerate torchaudio datasets
pip install --upgrade transformers
````
**Note**: In order to use MMS you need to have at least `transformers >= 4.30` installed. If the `4.30` version
is not yet available [on PyPI](https://pypi.org/project/transformers/) make sure to install `transformers` from
source:
```
pip install git+https://github.com/huggingface/transformers.git
```
Next, we load a couple of audio samples via `datasets`. Make sure that the audio data is sampled to 16000 kHz.
```py
from datasets import load_dataset, Audio
# English
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]
# Arabic
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "ar", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
ar_sample = next(iter(stream_data))["audio"]["array"]
```
Next, we load the model and processor
```py
from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch
model_id = "facebook/mms-lid-1024"
processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id)
```
Now we process the audio data, pass the processed audio data to the model to classify it into a language, just like we usually do for Wav2Vec2 audio classification models such as [ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition](https://huggingface.co/harshit345/xlsr-wav2vec-speech-emotion-recognition)
```py
# English
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
# 'eng'
# Arabic
inputs = processor(ar_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
# 'ara'
```
To see all the supported languages of a checkpoint, you can print out the language ids as follows:
```py
processor.id2label.values()
```
For more details, about the architecture please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
## Supported Languages
This model supports 1024 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3).
You can find more details about the languages and their ISO 649-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
<details>
<summary>Click to toggle</summary>
- ara
- cmn
- eng
- spa
- fra
- mlg
- swe
- por
- vie
- ful
- sun
- asm
- ben
- zlm
- kor
- ind
- hin
- tuk
- urd
- aze
- slv
- mon
- hau
- tel
- swh
- bod
- rus
- tur
- heb
- mar
- som
- tgl
- tat
- tha
- cat
- ron
- mal
- bel
- pol
- yor
- nld
- bul
- hat
- afr
- isl
- amh
- tam
- hun
- hrv
- lit
- cym
- fas
- mkd
- ell
- bos
- deu
- sqi
- jav
- kmr
- nob
- uzb
- snd
- lat
- nya
- grn
- mya
- orm
- lin
- hye
- yue
- pan
- jpn
- kaz
- npi
- kik
- kat
- guj
- kan
- tgk
- ukr
- ces
- lav
- bak
- khm
- cak
- fao
- glg
- ltz
- xog
- lao
- mlt
- sin
- aka
- sna
- che
- mam
- ita
- quc
- aiw
- srp
- mri
- tuv
- nno
- pus
- eus
- kbp
- gur
- ory
- lug
- crh
- bre
- luo
- nhx
- slk
- ewe
- xsm
- fin
- rif
- dan
- saq
- yid
- yao
- mos
- quh
- hne
- xon
- new
- dtp
- quy
- est
- ddn
- dyu
- ttq
- bam
- pse
- uig
- sck
- ngl
- tso
- mup
- dga
- seh
- lis
- wal
- ctg
- mip
- bfz
- bxk
- ceb
- kru
- war
- khg
- bbc
- thl
- nzi
- vmw
- mzi
- ycl
- zne
- sid
- asa
- tpi
- bmq
- box
- zpu
- gof
- nym
- cla
- bgq
- bfy
- hlb
- qxl
- teo
- fon
- sda
- kfx
- bfa
- mag
- tzh
- pil
- maj
- maa
- kdt
- ksb
- lns
- btd
- rej
- pap
- ayr
- any
- mnk
- adx
- gud
- krc
- onb
- xal
- ctd
- nxq
- ava
- blt
- lbw
- hyw
- udm
- zar
- tzo
- kpv
- san
- xnj
- kek
- chv
- kcg
- kri
- ati
- bgw
- mxt
- ybb
- btx
- dgi
- nhy
- dnj
- zpz
- yba
- lon
- smo
- men
- ium
- mgd
- taq
- nga
- nsu
- zaj
- tly
- prk
- zpt
- akb
- mhr
- mxb
- nuj
- obo
- kir
- bom
- run
- zpg
- hwc
- mnw
- ubl
- kin
- xtm
- hnj
- mpm
- rkt
- miy
- luc
- mih
- kne
- mib
- flr
- myv
- xmm
- knk
- iba
- gux
- pis
- zmz
- ses
- dav
- lif
- qxr
- dig
- kdj
- wsg
- tir
- gbm
- mai
- zpc
- kus
- nyy
- mim
- nan
- nyn
- gog
- ngu
- tbz
- hoc
- nyf
- sus
- guk
- gwr
- yaz
- bcc
- sbd
- spp
- hak
- grt
- kno
- oss
- suk
- spy
- nij
- lsm
- kaa
- bem
- rmy
- kqn
- nim
- ztq
- nus
- bib
- xtd
- ach
- mil
- keo
- mpg
- gjn
- zaq
- kdh
- dug
- sah
- awa
- kff
- dip
- rim
- nhe
- pcm
- kde
- tem
- quz
- mfq
- las
- bba
- kbr
- taj
- dyo
- zao
- lom
- shk
- dik
- dgo
- zpo
- fij
- bgc
- xnr
- bud
- kac
- laj
- mev
- maw
- quw
- kao
- dag
- ktb
- lhu
- zab
- mgh
- shn
- otq
- lob
- pbb
- oci
- zyb
- bsq
- mhi
- dzo
- zas
- guc
- alz
- ctu
- wol
- guw
- mnb
- nia
- zaw
- mxv
- bci
- sba
- kab
- dwr
- nnb
- ilo
- mfe
- srx
- ruf
- srn
- zad
- xpe
- pce
- ahk
- bcl
- myk
- haw
- mad
- ljp
- bky
- gmv
- nag
- nav
- nyo
- kxm
- nod
- sag
- zpl
- sas
- myx
- sgw
- old
- irk
- acf
- mak
- kfy
- zai
- mie
- zpm
- zpi
- ote
- jam
- kpz
- lgg
- lia
- nhi
- mzm
- bdq
- xtn
- mey
- mjl
- sgj
- kdi
- kxc
- miz
- adh
- tap
- hay
- kss
- pam
- gor
- heh
- nhw
- ziw
- gej
- yua
- itv
- shi
- qvw
- mrw
- hil
- mbt
- pag
- vmy
- lwo
- cce
- kum
- klu
- ann
- mbb
- npl
- zca
- pww
- toc
- ace
- mio
- izz
- kam
- zaa
- krj
- bts
- eza
- zty
- hns
- kki
- min
- led
- alw
- tll
- rng
- pko
- toi
- iqw
- ncj
- toh
- umb
- mog
- hno
- wob
- gxx
- hig
- nyu
- kby
- ban
- syl
- bxg
- nse
- xho
- zae
- mkw
- nch
- ibg
- mas
- qvz
- bum
- bgd
- mww
- epo
- tzm
- zul
- bcq
- lrc
- xdy
- tyv
- ibo
- loz
- mza
- abk
- azz
- guz
- arn
- ksw
- lus
- tos
- gvr
- top
- ckb
- mer
- pov
- lun
- rhg
- knc
- sfw
- bev
- tum
- lag
- nso
- bho
- ndc
- maf
- gkp
- bax
- awn
- ijc
- qug
- lub
- srr
- mni
- zza
- ige
- dje
- mkn
- bft
- tiv
- otn
- kck
- kqs
- gle
- lua
- pdt
- swk
- mgw
- ebu
- ada
- lic
- skr
- gaa
- mfa
- vmk
- mcn
- bto
- lol
- bwr
- unr
- dzg
- hdy
- kea
- bhi
- glk
- mua
- ast
- nup
- sat
- ktu
- bhb
- zpq
- coh
- bkm
- gya
- sgc
- dks
- ncl
- tui
- emk
- urh
- ego
- ogo
- tsc
- idu
- igb
- ijn
- njz
- ngb
- tod
- jra
- mrt
- zav
- tke
- its
- ady
- bzw
- kng
- kmb
- lue
- jmx
- tsn
- bin
- ble
- gom
- ven
- sef
- sco
- her
- iso
- trp
- glv
- haq
- toq
- okr
- kha
- wof
- rmn
- sot
- kaj
- bbj
- sou
- mjt
- trd
- gno
- mwn
- igl
- rag
- eyo
- div
- efi
- nde
- mfv
- mix
- rki
- kjg
- fan
- khw
- wci
- bjn
- pmy
- bqi
- ina
- hni
- mjx
- kuj
- aoz
- the
- tog
- tet
- nuz
- ajg
- ccp
- mau
- ymm
- fmu
- tcz
- xmc
- nyk
- ztg
- knx
- snk
- zac
- esg
- srb
- thq
- pht
- wes
- rah
- pnb
- ssy
- zpv
- kpo
- phr
- atd
- eto
- xta
- mxx
- mui
- uki
- tkt
- mgp
- xsq
- enq
- nnh
- qxp
- zam
- bug
- bxr
- maq
- tdt
- khb
- mrr
- kas
- zgb
- kmw
- lir
- vah
- dar
- ssw
- hmd
- jab
- iii
- peg
- shr
- brx
- rwr
- bmb
- kmc
- mji
- dib
- pcc
- nbe
- mrd
- ish
- kai
- yom
- zyn
- hea
- ewo
- bas
- hms
- twh
- kfq
- thr
- xtl
- wbr
- bfb
- wtm
- mjc
- blk
- lot
- dhd
- swv
- wbm
- zzj
- kge
- mgm
- niq
- zpj
- bwx
- bde
- mtr
- gju
- kjp
- mbz
- haz
- lpo
- yig
- qud
- shy
- gjk
- ztp
- nbl
- aii
- kun
- say
- mde
- sjp
- bns
- brh
- ywq
- msi
- anr
- mrg
- mjg
- tan
- tsg
- tcy
- kbl
- mdr
- mks
- noe
- tyz
- zpa
- ahr
- aar
- wuu
- khr
- kbd
- kex
- bca
- nku
- pwr
- hsn
- ort
- ott
- swi
- kua
- tdd
- msm
- bgp
- nbm
- mxy
- abs
- zlj
- ebo
- lea
- dub
- sce
- xkb
- vav
- bra
- ssb
- sss
- nhp
- kad
- kvx
- lch
- tts
- zyj
- kxp
- lmn
- qvi
- lez
- scl
- cqd
- ayb
- xbr
- nqg
- dcc
- cjk
- bfr
- zyg
- mse
- gru
- mdv
- bew
- wti
- arg
- dso
- zdj
- pll
- mig
- qxs
- bol
- drs
- anp
- chw
- bej
- vmc
- otx
- xty
- bjj
- vmz
- ibb
- gby
- twx
- tig
- thz
- tku
- hmz
- pbm
- mfn
- nut
- cyo
- mjw
- cjm
- tlp
- naq
- rnd
- stj
- sym
- jax
- btg
- tdg
- sng
- nlv
- kvr
- pch
- fvr
- mxs
- wni
- mlq
- kfr
- mdj
- osi
- nhn
- ukw
- tji
- qvj
- nih
- bcy
- hbb
- zpx
- hoj
- cpx
- ogc
- cdo
- bgn
- bfs
- vmx
- tvn
- ior
- mxa
- btm
- anc
- jit
- mfb
- mls
- ets
- goa
- bet
- ikw
- pem
- trf
- daq
- max
- rad
- njo
- bnx
- mxl
- mbi
- nba
- zpn
- zts
- mut
- hnd
- mta
- hav
- hac
- ryu
- abr
- yer
- cld
- zag
- ndo
- sop
- vmm
- gcf
- chr
- cbk
- sbk
- bhp
- odk
- mbd
- nap
- gbr
- mii
- czh
- xti
- vls
- gdx
- sxw
- zaf
- wem
- mqh
- ank
- yaf
- vmp
- otm
- sdh
- anw
- src
- mne
- wss
- meh
- kzc
- tma
- ttj
- ots
- ilp
- zpr
- saz
- ogb
- akl
- nhg
- pbv
- rcf
- cgg
- mku
- bez
- mwe
- mtb
- gul
- ifm
- mdh
- scn
- lki
- xmf
- sgd
- aba
- cos
- luz
- zpy
- stv
- kjt
- mbf
- kmz
- nds
- mtq
- tkq
- aee
- knn
- mbs
- mnp
- ema
- bar
- unx
- plk
- psi
- mzn
- cja
- sro
- mdw
- ndh
- vmj
- zpw
- kfu
- bgx
- gsw
- fry
- zpe
- zpd
- bta
- psh
- zat
</details>
## Model details
- **Developed by:** Vineel Pratap et al.
- **Model type:** Multi-Lingual Automatic Speech Recognition model
- **Language(s):** 1024 languages, see [supported languages](#supported-languages)
- **License:** CC-BY-NC 4.0 license
- **Num parameters**: 1 billion
- **Audio sampling rate**: 16,000 kHz
- **Cite as:**
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
journal={arXiv},
year={2023}
}
## Additional Links
- [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/)
- [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
- [Paper](https://arxiv.org/abs/2305.13516)
- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
- [Other **MMS** checkpoints](https://huggingface.co/models?other=mms)
- MMS base checkpoints:
- [facebook/mms-1b](https://huggingface.co/facebook/mms-1b)
- [facebook/mms-300m](https://huggingface.co/facebook/mms-300m)
- [Official Space](https://huggingface.co/spaces/facebook/MMS)