Automatic Speech Recognition
Transformers
Safetensors
Portuguese
whisper
contrastive-learning
synthetic-data-filtering
Inference Endpoints
yuriyvnv commited on
Commit
b09cb25
1 Parent(s): 6adf032

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -6,10 +6,12 @@ tags: [automatic-speech-recognition, contrastive-learning, synthetic-data-filter
6
  # Model Card for Finetuned Version of Whisper-Small
7
 
8
  This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
9
- The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality
 
 
10
  In this Specific Model we used 82,32% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
11
  The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
12
-
13
 
14
 
15
  ## Model Details
 
6
  # Model Card for Finetuned Version of Whisper-Small
7
 
8
  This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
9
+ The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality.
10
+ ---------------------------------------------------------------------------------------------------------------------------------------
11
+
12
  In this Specific Model we used 82,32% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
13
  The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
14
+ --------------------------------------------
15
 
16
 
17
  ## Model Details