Jzuluaga commited on
Commit
37e8783
1 Parent(s): f3223f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -7
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  language:
3
- - en
4
- thumbnail:
5
  tags:
6
  - audio-classification
7
  - speechbrain
@@ -11,20 +11,24 @@ tags:
11
  - wav2vec2
12
  - XLSR
13
  - CommonAccent
14
- license: "mit"
15
  datasets:
16
  - CommonVoice
17
  metrics:
18
  - Accuracy
19
  widget:
20
  - example_title: Caribe-Colombia-Cuba
21
- src: https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-spanish/resolve/main/data/caribe-cuba-colombia.wav
 
22
  - example_title: Andino
23
- src: https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-spanish/resolve/main/data/andino.wav
 
24
  - example_title: Mexico
25
- src: https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-spanish/resolve/main/data/mexico.wav
 
26
  - example_title: Spain
27
- src: https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-spanish/resolve/main/data/spain.wav
 
28
  ---
29
 
30
 
@@ -34,6 +38,8 @@ widget:
34
 
35
  # CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice
36
 
 
 
37
 
38
  **Abstract**:
39
  Despite the recent advancements in Automatic Speech Recognition (ASR), the recognition of accented speech still remains a dominant problem. In order to create more inclusive ASR systems, research has shown that the integration of accent information, as part of a larger ASR framework, can lead to the mitigation of accented speech errors. We address multilingual accent classification through the ECAPA-TDNN and Wav2Vec 2.0/XLSR architectures which have been proven to perform well on a variety of speech-related downstream tasks. We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7.0 (English) and Common Voice 11.0 (Italian, German, and Spanish). Furthermore, we establish new state-of-the-art for English accent classification with as high as 95% accuracy. We also study the internal categorization of the Wav2Vev 2.0 embeddings through t-SNE, noting that there is a level of clustering based on phonological similarity.
 
1
  ---
2
  language:
3
+ - es
4
+ thumbnail: null
5
  tags:
6
  - audio-classification
7
  - speechbrain
 
11
  - wav2vec2
12
  - XLSR
13
  - CommonAccent
14
+ license: mit
15
  datasets:
16
  - CommonVoice
17
  metrics:
18
  - Accuracy
19
  widget:
20
  - example_title: Caribe-Colombia-Cuba
21
+ src: >-
22
+ https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-spanish/resolve/main/data/caribe-cuba-colombia.wav
23
  - example_title: Andino
24
+ src: >-
25
+ https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-spanish/resolve/main/data/andino.wav
26
  - example_title: Mexico
27
+ src: >-
28
+ https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-spanish/resolve/main/data/mexico.wav
29
  - example_title: Spain
30
+ src: >-
31
+ https://huggingface.co/Jzuluaga/accent-id-commonaccent_xlsr-spanish/resolve/main/data/spain.wav
32
  ---
33
 
34
 
 
38
 
39
  # CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice
40
 
41
+ **Spanish Accent Classifier**
42
+
43
 
44
  **Abstract**:
45
  Despite the recent advancements in Automatic Speech Recognition (ASR), the recognition of accented speech still remains a dominant problem. In order to create more inclusive ASR systems, research has shown that the integration of accent information, as part of a larger ASR framework, can lead to the mitigation of accented speech errors. We address multilingual accent classification through the ECAPA-TDNN and Wav2Vec 2.0/XLSR architectures which have been proven to perform well on a variety of speech-related downstream tasks. We introduce a simple-to-follow recipe aligned to the SpeechBrain toolkit for accent classification based on Common Voice 7.0 (English) and Common Voice 11.0 (Italian, German, and Spanish). Furthermore, we establish new state-of-the-art for English accent classification with as high as 95% accuracy. We also study the internal categorization of the Wav2Vev 2.0 embeddings through t-SNE, noting that there is a level of clustering based on phonological similarity.