my-north-ai
/

whisper-small-pt

Automatic Speech Recognition

contrastive-learning

synthetic-data-filtering

Inference Endpoints

Model card Files Files and versions Community

yuriyvnv commited on Jul 17

Commit

b09cb25

•

1 Parent(s): 6adf032

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -6,10 +6,12 @@ tags: [automatic-speech-recognition, contrastive-learning, synthetic-data-filter
 # Model Card for Finetuned Version of Whisper-Small
 This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
-The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality
 In this Specific Model we used 82,32% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
 The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
 ## Model Details

 # Model Card for Finetuned Version of Whisper-Small
 This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
+The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality.
+---------------------------------------------------------------------------------------------------------------------------------------
 In this Specific Model we used 82,32% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
 The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
+--------------------------------------------
 ## Model Details