Update README.md
Browse files
README.md
CHANGED
@@ -6,10 +6,12 @@ tags: [automatic-speech-recognition, contrastive-learning, synthetic-data-filter
|
|
6 |
# Model Card for Finetuned Version of Whisper-Small
|
7 |
|
8 |
This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
|
9 |
-
The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality
|
|
|
|
|
10 |
In this Specific Model we used 82,32% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
|
11 |
The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
|
12 |
-
|
13 |
|
14 |
|
15 |
## Model Details
|
|
|
6 |
# Model Card for Finetuned Version of Whisper-Small
|
7 |
|
8 |
This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
|
9 |
+
The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality.
|
10 |
+
---------------------------------------------------------------------------------------------------------------------------------------
|
11 |
+
|
12 |
In this Specific Model we used 82,32% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
|
13 |
The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
|
14 |
+
--------------------------------------------
|
15 |
|
16 |
|
17 |
## Model Details
|