freds0
/

canary-1b-portuguese

Automatic Speech Recognition

automatic-speech-translation

hf-asr-leaderboard

Model card Files Files and versions

freds0 commited on Sep 4

Commit

28d3371

·

1 Parent(s): 566b5dd

Atualizando arquivos do modelo

Files changed (1) hide show

README.md +5 -47

README.md CHANGED Viewed

@@ -1,25 +1,10 @@
 ---
 license: cc-by-nc-4.0
 language:
-- en
-- de
-- es
-- fr
 library_name: nemo
 datasets:
-- librispeech_asr
-- fisher_corpus
-- Switchboard-1
-- WSJ-0
-- WSJ-1
-- National-Singapore-Corpus-Part-1
-- National-Singapore-Corpus-Part-6
-- vctk
-- voxpopuli
-- europarl
-- multilingual_librispeech
-- mozilla-foundation/common_voice_8_0
-- MLCommons/peoples_speech
 thumbnail: null
 tags:
 - automatic-speech-recognition
@@ -406,7 +391,7 @@ The model outputs the transcribed/translated text corresponding to the input aud
 ## Training
-Canary-1B is trained using the  NVIDIA NeMo toolkit [4] for 150k steps with dynamic bucketing and a batch duration of 360s per GPU on 128 NVIDIA A100 80GB GPUs.
 The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
 The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
@@ -418,35 +403,8 @@ The Canary-1B model is trained on a total of 85k hrs of speech data. It consists
 The constituents of public data are as follows.
-#### English (25.5k hours)
-- Librispeech 960 hours
-- Fisher Corpus
-- Switchboard-1 Dataset
-- WSJ-0 and WSJ-1
-- National Speech Corpus (Part 1, Part 6)
-- VCTK
-- VoxPopuli (EN)
-- Europarl-ASR (EN)
-- Multilingual Librispeech (MLS EN) - 2,000 hour subset
-- Mozilla Common Voice (v7.0)
-- People's Speech - 12,000 hour subset
-- Mozilla Common Voice (v11.0)  - 1,474 hour subset
-#### German (2.5k hours)
-- Mozilla Common Voice (v12.0)  - 800 hour subset
-- Multilingual Librispeech (MLS DE) - 1,500 hour subset
-- VoxPopuli (DE) - 200 hr subset
-#### Spanish (1.4k hours)
-- Mozilla Common Voice (v12.0)  - 395 hour subset
-- Multilingual Librispeech (MLS ES) - 780 hour subset
-- VoxPopuli (ES) - 108 hour subset
-- Fisher  - 141 hour subset
-#### French (1.8k hours)
-- Mozilla Common Voice (v12.0)  - 708 hour subset
-- Multilingual Librispeech (MLS FR) - 926 hour subset
-- VoxPopuli (FR) - 165 hour subset
 ## Performance

 ---
 license: cc-by-nc-4.0
 language:
+- pt
 library_name: nemo
 datasets:
+- tagerela
 thumbnail: null
 tags:
 - automatic-speech-recognition
 ## Training
+Canary-1B is trained using the  NVIDIA NeMo toolkit [4] for 400k steps on 1 NVIDIA B200 196GB GPU.
 The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
 The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 The constituents of public data are as follows.
+#### Portuguese (8.9k hours)
+- Tagarela
 ## Performance