patrickvonplaten
commited on
Commit
•
5efc24d
1
Parent(s):
ae15145
add files
Browse files- README.md +19 -0
- preprocessor_config.json +9 -0
- special_tokens_map.json +1 -0
- tokenizer_config.json +1 -0
- vocab.json +1 -0
README.md
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: lt
|
3 |
+
tags:
|
4 |
+
- audio
|
5 |
+
- automatic-speech-recognition
|
6 |
+
- voxpopuli
|
7 |
+
license: cc-by-nc-4.0
|
8 |
+
---
|
9 |
+
|
10 |
+
# Wav2Vec2-Base-VoxPopuli-Finetuned
|
11 |
+
|
12 |
+
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) large model pretrained on the 10K unlabeled subset of [VoxPopuli corpus](https://arxiv.org/abs/2101.00390) and fine-tuned on the transcribed data in lt (refer to Table 1 of paper for more information).
|
13 |
+
|
14 |
+
**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
|
15 |
+
Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)*
|
16 |
+
|
17 |
+
**Authors**: *Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux* from *Facebook AI*
|
18 |
+
|
19 |
+
See the official website for more information, [here](https://github.com/facebookresearch/voxpopuli/)
|
preprocessor_config.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"do_normalize": true,
|
3 |
+
"feature_extractor_type": "Wav2Vec2FeatureExtractor",
|
4 |
+
"feature_size": 1,
|
5 |
+
"padding_side": "right",
|
6 |
+
"padding_value": 0,
|
7 |
+
"return_attention_mask": false,
|
8 |
+
"sampling_rate": 16000
|
9 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>"}
|
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "pad_token": "<pad>", "do_lower_case": false, "word_delimiter_token": "|"}
|
vocab.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"<s>": 0, "<pad>": 1, "</s>": 2, "<unk>": 3, "|": 4, "e": 5, "i": 6, "a": 7, "o": 8, "n": 9, "t": 10, "r": 11, "l": 12, "s": 13, "c": 14, "d": 15, "u": 16, "p": 17, "m": 18, "g": 19, "v": 20, "h": 21, "z": 22, "f": 23, "b": 24, "q": 25, "à": 26, "è": 27, "ù": 28, "é": 29, "ò": 30, "ì": 31, "k": 32, "y": 33, "x": 34, "w": 35, "j": 36, "ó": 37, "í": 38, "ï": 39}
|