Some grammar
Browse files
README.md
CHANGED
@@ -258,7 +258,7 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
|
|
258 |
## Explainability
|
259 |
|
260 |
- High-Level Application and Domain: Automatic Speech Recognition
|
261 |
-
- Describe how this model works:
|
262 |
- Verified to have met prescribed quality standards: Yes
|
263 |
- Performance Metrics: Word Error Rate (WER), Character Error Rate (CER), Real-Time Factor
|
264 |
- Potential Known Risks: Transcripts may not be 100% accurate. Accuracy varies based on the characteristics of input audio (Domain, Use Case, Accent, Noise, Speech Type, Context of speech, etcetera).
|
@@ -267,19 +267,19 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
|
|
267 |
|
268 |
**Test Hardware:** A6000 GPU
|
269 |
|
270 |
-
The performance of Automatic Speech Recognition models is
|
271 |
-
Since this dataset is trained on multiple domains it will generally perform
|
272 |
|
273 |
-
The following tables
|
274 |
Performances of the ASR models are reported in terms of Word Error Rate (WER%) and Inverse Real-Time Factor (RTFx) with greedy decoding on test sets.
|
275 |
|
276 |
- Transducer
|
277 |
-
|**Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
|
278 |
|----------|-------------|-------------------|----------------|----------------|----------------|----------------|
|
279 |
| 2.0.0 | SentencePiece Unigram | 1024 | 9.90| 1535.45 | 12.32 | 1144.34 |
|
280 |
|
281 |
- CTC
|
282 |
-
|**Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
|
283 |
|----------|-------------|-------------------|----------------|----------------|----------------|----------------|
|
284 |
| 2.0.0 | SentencePiece Unigram | 1024 | 11.19 | 1891.04 | 13.23 | 1565.59 |
|
285 |
|
@@ -310,7 +310,7 @@ These are greedy WER numbers without external LM. More details on evaluation can
|
|
310 |
- Non-streaming ASR model
|
311 |
- Model outputs text in Armenian
|
312 |
- Output text requires Inverse Text Normalization
|
313 |
-
- Model is noise
|
314 |
- Model is not applicable for life-critical applications.
|
315 |
|
316 |
### Access Reactions:
|
@@ -319,7 +319,7 @@ The Principle of Least Privilege (PoLP) is applied limiting access for dataset g
|
|
319 |
|
320 |
## NVIDIA Riva: Deployment
|
321 |
|
322 |
-
[NVIDIA Riva](https://developer.nvidia.com/riva)
|
323 |
|
324 |
Additionally, Riva provides:
|
325 |
|
|
|
258 |
## Explainability
|
259 |
|
260 |
- High-Level Application and Domain: Automatic Speech Recognition
|
261 |
+
- Describe how this model works: The model transcribes audio input into text for the Armenian language
|
262 |
- Verified to have met prescribed quality standards: Yes
|
263 |
- Performance Metrics: Word Error Rate (WER), Character Error Rate (CER), Real-Time Factor
|
264 |
- Potential Known Risks: Transcripts may not be 100% accurate. Accuracy varies based on the characteristics of input audio (Domain, Use Case, Accent, Noise, Speech Type, Context of speech, etcetera).
|
|
|
267 |
|
268 |
**Test Hardware:** A6000 GPU
|
269 |
|
270 |
+
The performance of Automatic Speech Recognition models is measured using Word Error Rate (WER) and Char Error Rate (CER).
|
271 |
+
Since this dataset is trained on multiple domains, it will generally perform well at transcribing audio in general.
|
272 |
|
273 |
+
The following tables summarize the performance of the available models in this collection with the Transducer decoder.
|
274 |
Performances of the ASR models are reported in terms of Word Error Rate (WER%) and Inverse Real-Time Factor (RTFx) with greedy decoding on test sets.
|
275 |
|
276 |
- Transducer
|
277 |
+
|**NeMo Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
|
278 |
|----------|-------------|-------------------|----------------|----------------|----------------|----------------|
|
279 |
| 2.0.0 | SentencePiece Unigram | 1024 | 9.90| 1535.45 | 12.32 | 1144.34 |
|
280 |
|
281 |
- CTC
|
282 |
+
|**NeMo Version**|**Tokenizer**|**Vocabulary Size**|**MCV test WER**|**MCV test RTFx**|**FLEURS test WER**|**FLEURS test RTFx**|
|
283 |
|----------|-------------|-------------------|----------------|----------------|----------------|----------------|
|
284 |
| 2.0.0 | SentencePiece Unigram | 1024 | 11.19 | 1891.04 | 13.23 | 1565.59 |
|
285 |
|
|
|
310 |
- Non-streaming ASR model
|
311 |
- Model outputs text in Armenian
|
312 |
- Output text requires Inverse Text Normalization
|
313 |
+
- Model is noise-sensitive
|
314 |
- Model is not applicable for life-critical applications.
|
315 |
|
316 |
### Access Reactions:
|
|
|
319 |
|
320 |
## NVIDIA Riva: Deployment
|
321 |
|
322 |
+
[NVIDIA Riva](https://developer.nvidia.com/riva) is an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, on edge, and embedded.
|
323 |
|
324 |
Additionally, Riva provides:
|
325 |
|