Atualizando arquivos do modelo
Browse files
README.md
CHANGED
|
@@ -1,25 +1,10 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
language:
|
| 4 |
-
-
|
| 5 |
-
- de
|
| 6 |
-
- es
|
| 7 |
-
- fr
|
| 8 |
library_name: nemo
|
| 9 |
datasets:
|
| 10 |
-
-
|
| 11 |
-
- fisher_corpus
|
| 12 |
-
- Switchboard-1
|
| 13 |
-
- WSJ-0
|
| 14 |
-
- WSJ-1
|
| 15 |
-
- National-Singapore-Corpus-Part-1
|
| 16 |
-
- National-Singapore-Corpus-Part-6
|
| 17 |
-
- vctk
|
| 18 |
-
- voxpopuli
|
| 19 |
-
- europarl
|
| 20 |
-
- multilingual_librispeech
|
| 21 |
-
- mozilla-foundation/common_voice_8_0
|
| 22 |
-
- MLCommons/peoples_speech
|
| 23 |
thumbnail: null
|
| 24 |
tags:
|
| 25 |
- automatic-speech-recognition
|
|
@@ -406,7 +391,7 @@ The model outputs the transcribed/translated text corresponding to the input aud
|
|
| 406 |
|
| 407 |
## Training
|
| 408 |
|
| 409 |
-
Canary-1B is trained using the NVIDIA NeMo toolkit [4] for
|
| 410 |
The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
|
| 411 |
|
| 412 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
|
@@ -418,35 +403,8 @@ The Canary-1B model is trained on a total of 85k hrs of speech data. It consists
|
|
| 418 |
|
| 419 |
The constituents of public data are as follows.
|
| 420 |
|
| 421 |
-
####
|
| 422 |
-
-
|
| 423 |
-
- Fisher Corpus
|
| 424 |
-
- Switchboard-1 Dataset
|
| 425 |
-
- WSJ-0 and WSJ-1
|
| 426 |
-
- National Speech Corpus (Part 1, Part 6)
|
| 427 |
-
- VCTK
|
| 428 |
-
- VoxPopuli (EN)
|
| 429 |
-
- Europarl-ASR (EN)
|
| 430 |
-
- Multilingual Librispeech (MLS EN) - 2,000 hour subset
|
| 431 |
-
- Mozilla Common Voice (v7.0)
|
| 432 |
-
- People's Speech - 12,000 hour subset
|
| 433 |
-
- Mozilla Common Voice (v11.0) - 1,474 hour subset
|
| 434 |
-
|
| 435 |
-
#### German (2.5k hours)
|
| 436 |
-
- Mozilla Common Voice (v12.0) - 800 hour subset
|
| 437 |
-
- Multilingual Librispeech (MLS DE) - 1,500 hour subset
|
| 438 |
-
- VoxPopuli (DE) - 200 hr subset
|
| 439 |
-
|
| 440 |
-
#### Spanish (1.4k hours)
|
| 441 |
-
- Mozilla Common Voice (v12.0) - 395 hour subset
|
| 442 |
-
- Multilingual Librispeech (MLS ES) - 780 hour subset
|
| 443 |
-
- VoxPopuli (ES) - 108 hour subset
|
| 444 |
-
- Fisher - 141 hour subset
|
| 445 |
-
|
| 446 |
-
#### French (1.8k hours)
|
| 447 |
-
- Mozilla Common Voice (v12.0) - 708 hour subset
|
| 448 |
-
- Multilingual Librispeech (MLS FR) - 926 hour subset
|
| 449 |
-
- VoxPopuli (FR) - 165 hour subset
|
| 450 |
|
| 451 |
|
| 452 |
## Performance
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
language:
|
| 4 |
+
- pt
|
|
|
|
|
|
|
|
|
|
| 5 |
library_name: nemo
|
| 6 |
datasets:
|
| 7 |
+
- tagerela
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
thumbnail: null
|
| 9 |
tags:
|
| 10 |
- automatic-speech-recognition
|
|
|
|
| 391 |
|
| 392 |
## Training
|
| 393 |
|
| 394 |
+
Canary-1B is trained using the NVIDIA NeMo toolkit [4] for 400k steps on 1 NVIDIA B200 196GB GPU.
|
| 395 |
The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
|
| 396 |
|
| 397 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
|
|
|
| 403 |
|
| 404 |
The constituents of public data are as follows.
|
| 405 |
|
| 406 |
+
#### Portuguese (8.9k hours)
|
| 407 |
+
- Tagarela
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 408 |
|
| 409 |
|
| 410 |
## Performance
|