Link to "SeamlessM4T v1" paper, where the w2v-BERT 2.0 was presented for the first time.
Browse filesThe w2v-BERT 2.0 model was initially introduced in the "SeamlessM4T v1" paper, specifically in Section 4.1, available at https://arxiv.org/abs/2308.11596.
While the "SeamlessM4T v2" paper also discusses this model, it does not delve into the same level of detail as the v1 paper.
README.md
CHANGED
@@ -101,7 +101,7 @@ inference: false
|
|
101 |
---
|
102 |
# W2v-BERT 2.0 speech encoder
|
103 |
|
104 |
-
We are open-sourcing our Conformer-based [W2v-BERT 2.0 speech encoder](#w2v-bert-20-speech-encoder) as described in Section
|
105 |
|
106 |
This model was pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. It requires finetuning to be used for downstream tasks such as Automatic Speech Recognition (ASR), or Audio Classification.
|
107 |
|
|
|
101 |
---
|
102 |
# W2v-BERT 2.0 speech encoder
|
103 |
|
104 |
+
We are open-sourcing our Conformer-based [W2v-BERT 2.0 speech encoder](#w2v-bert-20-speech-encoder) as described in Section 4.1 of the [paper](https://arxiv.org/abs/2308.11596), which is at the core of our Seamless models.
|
105 |
|
106 |
This model was pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. It requires finetuning to be used for downstream tasks such as Automatic Speech Recognition (ASR), or Audio Classification.
|
107 |
|