Update README.md
Browse files
README.md
CHANGED
@@ -1,21 +1,64 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
-
#
|
5 |
-
This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on Tunisian arabic dialect. This model utilizes a code_switching approach and can process
|
6 |
## Performance
|
7 |
-
|
8 |
-
|
|
|
9 |
|-----------------|---------|---------|
|
10 |
-
|
|
|
|
|
|
|
|
11 |
## Pipeline
|
12 |
The architecture comprises three components:
|
13 |
* French ASR pretrained with wav2vec2 on french corporas
|
14 |
* English ASR pretrained with wav2vec2 on english corporas
|
15 |
* Custom Tunisian ASR pretrained using wav2vec on a tunisian arabic corpora
|
16 |
All three models will process the audio data. Subsequently, the resulting posteriorgrams will be combined and utilized as input for the Mixer, which will produce the final posteriorgrams.
|
17 |
-
|
18 |
-
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
```
|
21 |
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
# Overview
|
5 |
+
This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on code-switched Tunisian arabic dialect. This model utilizes a code_switching approach and can process English , French and Tunisian Arabic
|
6 |
## Performance
|
7 |
+
The performance of the model on our TunSwitch CS released dataset is summarized below :
|
8 |
+
|
9 |
+
| Dataset |WER (%) | CER (%) |
|
10 |
|-----------------|---------|---------|
|
11 |
+
| TunSwitch CS |29.47 | 12.44 |
|
12 |
+
|
13 |
+
|
14 |
+
More details about the test sets, and the conditions leading to this performance in the paper.
|
15 |
## Pipeline
|
16 |
The architecture comprises three components:
|
17 |
* French ASR pretrained with wav2vec2 on french corporas
|
18 |
* English ASR pretrained with wav2vec2 on english corporas
|
19 |
* Custom Tunisian ASR pretrained using wav2vec on a tunisian arabic corpora
|
20 |
All three models will process the audio data. Subsequently, the resulting posteriorgrams will be combined and utilized as input for the Mixer, which will produce the final posteriorgrams.
|
21 |
+
|
22 |
+
## Dataset
|
23 |
+
Part of the audio and text data (The ones we collected) used to train and test the model has been provided to encourage and support research within the community. Please find the dataset [here](https://zenodo.org/record/8370566). This Zenodo record contains labeled and unlabeled Tunisian Arabic audio data, along with textual data for language modelling.
|
24 |
+
The folder also contains a 4-gram language model trained with KenLM on data released within the Zenodo record. The .arpa file is called "outdomain.arpa".
|
25 |
+
|
26 |
+
|
27 |
+
## Team
|
28 |
+
|
29 |
+
Here are the team members who have contributed to this project
|
30 |
+
|
31 |
+
* [Salah Zaiem](https://fr.linkedin.com/in/salah-zaiem)
|
32 |
+
* [Ahmed Amine Ben Aballah](https://www.linkedin.com/in/aabenz/)
|
33 |
+
* [Ata Kaboudi](https://www.linkedin.com/in/ata-kaboudi-63365b1a8)
|
34 |
+
* [Amir Kanoun](https://tn.linkedin.com/in/ahmed-amir-kanoun)
|
35 |
+
|
36 |
+
## Paper
|
37 |
+
More in-depth details and insights are available in a released preprint. Please find the paper [here](https://arxiv.org/abs/2309.11327).
|
38 |
+
If you use or refer to this model, please cite :
|
39 |
+
|
40 |
+
```
|
41 |
+
@misc{abdallah2023leveraging,
|
42 |
+
title={Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition},
|
43 |
+
author={Ahmed Amine Ben Abdallah and Ata Kabboudi and Amir Kanoun and Salah Zaiem},
|
44 |
+
year={2023},
|
45 |
+
eprint={2309.11327},
|
46 |
+
archivePrefix={arXiv},
|
47 |
+
primaryClass={eess.AS}
|
48 |
+
}
|
49 |
+
```
|
50 |
+
|
51 |
+
|
52 |
+
## Demo
|
53 |
+
Here is a working live demo : [LINK](https://huggingface.co/spaces/SalahZa/Code-Switched-Tunisian-SpeechToText)
|
54 |
+
|
55 |
+
|
56 |
+
## Inference
|
57 |
+
|
58 |
+
Please refer to the space demo for proper easy-to-use inference code.
|
59 |
+
|
60 |
+
|
61 |
+
## Contact :
|
62 |
+
If you have questions, you can send an email to : zaiemsalah@gmail.com
|
63 |
```
|
64 |
|