nvidia
/

stt_hy_fastconformer_hybrid_large_pc

@@ -258,7 +258,7 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
 ## Explainability
   - High-Level Application and Domain: Automatic Speech Recognition
-  - Describe how this model works: Model transcribes audio input into text for the Armenian language
   - Verified to have met prescribed quality standards: Yes
   - Performance Metrics: Word Error Rate (WER), Character Error Rate (CER), Real-Time Factor
   - Potential Known Risks: Transcripts may not be 100% accurate. Accuracy varies based on the characteristics of input audio (Domain, Use Case, Accent, Noise, Speech Type, Context of speech, etcetera).
@@ -267,19 +267,19 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
 **Test Hardware:** A6000 GPU
-The performance of Automatic Speech Recognition models is measuring using Word Error Rate (WER) and Char Error Rate (CER).
-Since this dataset is trained on multiple domains it will generally perform good at transcribing audio in general.
-The following tables summarizes the performance of the available models in this collection with the Transducer decoder.
 Performances of the ASR models are reported in terms of Word Error Rate (WER%) and Inverse Real-Time Factor (RTFx) with greedy decoding on test sets.
 - Transducer
-|**Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
 |----------|-------------|-------------------|----------------|----------------|----------------|----------------|
 | 2.0.0 | SentencePiece Unigram | 1024 | 9.90| 1535.45 | 12.32 | 1144.34 |
 - CTC
-|**Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
 |----------|-------------|-------------------|----------------|----------------|----------------|----------------|
 | 2.0.0 | SentencePiece Unigram | 1024 | 11.19 | 1891.04 | 13.23 | 1565.59 |
@@ -310,7 +310,7 @@ These are greedy WER numbers without external LM. More details on evaluation can
 - Non-streaming ASR model
 - Model outputs text in Armenian
 - Output text requires Inverse Text Normalization
-- Model is noise sensitive
 - Model is not applicable for life-critical applications.
 ### Access Reactions:
@@ -319,7 +319,7 @@ The Principle of Least Privilege (PoLP) is applied limiting access for dataset g
 ## NVIDIA Riva: Deployment
-[NVIDIA Riva](https://developer.nvidia.com/riva), is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
 Additionally, Riva provides:

 ## Explainability
   - High-Level Application and Domain: Automatic Speech Recognition
+  - Describe how this model works: The model transcribes audio input into text for the Armenian language
   - Verified to have met prescribed quality standards: Yes
   - Performance Metrics: Word Error Rate (WER), Character Error Rate (CER), Real-Time Factor
   - Potential Known Risks: Transcripts may not be 100% accurate. Accuracy varies based on the characteristics of input audio (Domain, Use Case, Accent, Noise, Speech Type, Context of speech, etcetera).
 **Test Hardware:** A6000 GPU
+The performance of Automatic Speech Recognition models is measured using Word Error Rate (WER) and Char Error Rate (CER).
+Since this dataset is trained on multiple domains, it will generally perform well at transcribing audio in general.
+The following tables summarize the performance of the available models in this collection with the Transducer decoder.
 Performances of the ASR models are reported in terms of Word Error Rate (WER%) and Inverse Real-Time Factor (RTFx) with greedy decoding on test sets.
 - Transducer
+|**NeMo Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
 |----------|-------------|-------------------|----------------|----------------|----------------|----------------|
 | 2.0.0 | SentencePiece Unigram | 1024 | 9.90| 1535.45 | 12.32 | 1144.34 |
 - CTC
+|**NeMo Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
 |----------|-------------|-------------------|----------------|----------------|----------------|----------------|
 | 2.0.0 | SentencePiece Unigram | 1024 | 11.19 | 1891.04 | 13.23 | 1565.59 |
 - Non-streaming ASR model
 - Model outputs text in Armenian
 - Output text requires Inverse Text Normalization
+- Model is noise-sensitive
 - Model is not applicable for life-critical applications.
 ### Access Reactions:
 ## NVIDIA Riva: Deployment
+[NVIDIA Riva](https://developer.nvidia.com/riva) is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
 Additionally, Riva provides: