freds0 commited on
Commit
28d3371
·
1 Parent(s): 566b5dd

Atualizando arquivos do modelo

Browse files
Files changed (1) hide show
  1. README.md +5 -47
README.md CHANGED
@@ -1,25 +1,10 @@
1
  ---
2
  license: cc-by-nc-4.0
3
  language:
4
- - en
5
- - de
6
- - es
7
- - fr
8
  library_name: nemo
9
  datasets:
10
- - librispeech_asr
11
- - fisher_corpus
12
- - Switchboard-1
13
- - WSJ-0
14
- - WSJ-1
15
- - National-Singapore-Corpus-Part-1
16
- - National-Singapore-Corpus-Part-6
17
- - vctk
18
- - voxpopuli
19
- - europarl
20
- - multilingual_librispeech
21
- - mozilla-foundation/common_voice_8_0
22
- - MLCommons/peoples_speech
23
  thumbnail: null
24
  tags:
25
  - automatic-speech-recognition
@@ -406,7 +391,7 @@ The model outputs the transcribed/translated text corresponding to the input aud
406
 
407
  ## Training
408
 
409
- Canary-1B is trained using the NVIDIA NeMo toolkit [4] for 150k steps with dynamic bucketing and a batch duration of 360s per GPU on 128 NVIDIA A100 80GB GPUs.
410
  The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
411
 
412
  The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
@@ -418,35 +403,8 @@ The Canary-1B model is trained on a total of 85k hrs of speech data. It consists
418
 
419
  The constituents of public data are as follows.
420
 
421
- #### English (25.5k hours)
422
- - Librispeech 960 hours
423
- - Fisher Corpus
424
- - Switchboard-1 Dataset
425
- - WSJ-0 and WSJ-1
426
- - National Speech Corpus (Part 1, Part 6)
427
- - VCTK
428
- - VoxPopuli (EN)
429
- - Europarl-ASR (EN)
430
- - Multilingual Librispeech (MLS EN) - 2,000 hour subset
431
- - Mozilla Common Voice (v7.0)
432
- - People's Speech - 12,000 hour subset
433
- - Mozilla Common Voice (v11.0) - 1,474 hour subset
434
-
435
- #### German (2.5k hours)
436
- - Mozilla Common Voice (v12.0) - 800 hour subset
437
- - Multilingual Librispeech (MLS DE) - 1,500 hour subset
438
- - VoxPopuli (DE) - 200 hr subset
439
-
440
- #### Spanish (1.4k hours)
441
- - Mozilla Common Voice (v12.0) - 395 hour subset
442
- - Multilingual Librispeech (MLS ES) - 780 hour subset
443
- - VoxPopuli (ES) - 108 hour subset
444
- - Fisher - 141 hour subset
445
-
446
- #### French (1.8k hours)
447
- - Mozilla Common Voice (v12.0) - 708 hour subset
448
- - Multilingual Librispeech (MLS FR) - 926 hour subset
449
- - VoxPopuli (FR) - 165 hour subset
450
 
451
 
452
  ## Performance
 
1
  ---
2
  license: cc-by-nc-4.0
3
  language:
4
+ - pt
 
 
 
5
  library_name: nemo
6
  datasets:
7
+ - tagerela
 
 
 
 
 
 
 
 
 
 
 
 
8
  thumbnail: null
9
  tags:
10
  - automatic-speech-recognition
 
391
 
392
  ## Training
393
 
394
+ Canary-1B is trained using the NVIDIA NeMo toolkit [4] for 400k steps on 1 NVIDIA B200 196GB GPU.
395
  The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
396
 
397
  The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 
403
 
404
  The constituents of public data are as follows.
405
 
406
+ #### Portuguese (8.9k hours)
407
+ - Tagarela
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
408
 
409
 
410
  ## Performance