Orange
/

SSA-HuBERT-base-5k

Feature Extraction

speech processing

self-supervision

african languages

Inference Endpoints

🇪🇺 Region: EU

Model card Files Files and versions Community

Antoine-caubriere commited on Jul 16

Commit

abb4b8a

•

1 Parent(s): 58e65f1

Update README.md

Files changed (1) hide show

README.md +8 -6

README.md CHANGED Viewed

@@ -15,11 +15,6 @@ It is a balanced version in gender and languages representation compared to the
 - Languages: Bambara (bam), Dyula (dyu), French (fra), Fula (ful), Fulfulde (ffm), Fulfulde (fuh), Gulmancema (gux), Hausa (hau), Kinyarwanda (kin), Kituba (ktu), Lingala (lin), Luba-Lulua (lua), Mossi (mos), Maninkakan (mwk), Sango (sag), Songhai (son), Swahili (swc), Swahili (swh), Tamasheq (taq), Wolof (wol), Zarma (dje).
-## ASR fine-tuning
-The SpeechBrain toolkit (Ravanelli et al., 2021) is used to fine-tune the model.
-Fine-tuning is done for each language using the FLEURS dataset [2].
-The pretrained model (SSA-HuBERT-base-5k) is considered as a speech encoder and is fully fine-tuned with two 1024 linear layers and a softmax output at the top.
 ## License
 This model is released under the CC-by-NC 4.0 conditions.
@@ -52,10 +47,17 @@ Please cite our paper when using SSA-HuBERT-base-5k model:
 }
 ```
-## Results
 The following results are obtained in a greedy mode (no language model rescoring).
 Character error rates (CERs) and Word error rates (WERs) are given in the table below, on the 20 languages of the SSA subpart of the FLEURS dataset.
 | **Languages**                   | **CER** | **WER** |
 |:--------------------------------|:--------|:--------|
 | **Afrikaans**                   | 23.8    | 68.3    |

 - Languages: Bambara (bam), Dyula (dyu), French (fra), Fula (ful), Fulfulde (ffm), Fulfulde (fuh), Gulmancema (gux), Hausa (hau), Kinyarwanda (kin), Kituba (ktu), Lingala (lin), Luba-Lulua (lua), Mossi (mos), Maninkakan (mwk), Sango (sag), Songhai (son), Swahili (swc), Swahili (swh), Tamasheq (taq), Wolof (wol), Zarma (dje).
 ## License
 This model is released under the CC-by-NC 4.0 conditions.
 }
 ```
+## ASR fine-tuning
+The SpeechBrain toolkit (Ravanelli et al., 2021) is used to fine-tune the model.
+Fine-tuning is done for each language using the FLEURS dataset [2].
+The pretrained model (SSA-HuBERT-base-5k) is considered as a speech encoder and is fully fine-tuned with two 1024 linear layers and a softmax output at the top.
+### Results
 The following results are obtained in a greedy mode (no language model rescoring).
 Character error rates (CERs) and Word error rates (WERs) are given in the table below, on the 20 languages of the SSA subpart of the FLEURS dataset.
 | **Languages**                   | **CER** | **WER** |
 |:--------------------------------|:--------|:--------|
 | **Afrikaans**                   | 23.8    | 68.3    |