Pedro Cuenca
commited on
Commit
•
a77390b
1
Parent(s):
37ae5a5
* Update WER after measuring with final script.
Browse files
README.md
CHANGED
@@ -23,7 +23,7 @@ model-index:
|
|
23 |
metrics:
|
24 |
- name: Test WER
|
25 |
type: wer
|
26 |
-
value: 10.
|
27 |
---
|
28 |
|
29 |
# Wav2Vec2-Large-XLSR-53-Spanish
|
@@ -179,7 +179,7 @@ print("WER: {:2f}".format(100 * chunked_wer(result["sentence"], result["pred_str
|
|
179 |
|
180 |
```
|
181 |
|
182 |
-
**Test Result**: 10.
|
183 |
|
184 |
## Text processing
|
185 |
|
@@ -198,7 +198,7 @@ For dataset handling reasons, I initially split `train`+`validation` in 10% spli
|
|
198 |
* I trained for 30 epochs on the first split only, using similar values as the ones proposed by Patrick in his demo notebook. I used a batch_size of 24 with 2 gradient accumulation steps. This gave a WER of about 16.3%on the full test set.
|
199 |
* I then trained the resulting model on the 9 remaining splits, for 3 epochs each, but with a faster warmup of 75 steps.
|
200 |
* Next, I trained 3 epochs on each of the 10 splits using a smaller learning rate of `1e-4`. A warmup of 75 steps was used in this case too. The final model had a WER of about 11.7%.
|
201 |
-
* By this time we had already figured out the reason for the initial delay in training time, and I decided to use the full dataset for training. However, in my tests I had seen that varying the learning rate seemed to work well, so I wanted to replicate that. I selected a cosine schedule with hard restarts, a reference learning rate of `3e-5` and 10 epochs. I configured the cosine schedule to have 10 cycles too, and used no warmup. This produced a WER of ~10.
|
202 |
|
203 |
|
204 |
## Other things I tried
|
|
|
23 |
metrics:
|
24 |
- name: Test WER
|
25 |
type: wer
|
26 |
+
value: 10.50
|
27 |
---
|
28 |
|
29 |
# Wav2Vec2-Large-XLSR-53-Spanish
|
|
|
179 |
|
180 |
```
|
181 |
|
182 |
+
**Test Result**: 10.50 %
|
183 |
|
184 |
## Text processing
|
185 |
|
|
|
198 |
* I trained for 30 epochs on the first split only, using similar values as the ones proposed by Patrick in his demo notebook. I used a batch_size of 24 with 2 gradient accumulation steps. This gave a WER of about 16.3%on the full test set.
|
199 |
* I then trained the resulting model on the 9 remaining splits, for 3 epochs each, but with a faster warmup of 75 steps.
|
200 |
* Next, I trained 3 epochs on each of the 10 splits using a smaller learning rate of `1e-4`. A warmup of 75 steps was used in this case too. The final model had a WER of about 11.7%.
|
201 |
+
* By this time we had already figured out the reason for the initial delay in training time, and I decided to use the full dataset for training. However, in my tests I had seen that varying the learning rate seemed to work well, so I wanted to replicate that. I selected a cosine schedule with hard restarts, a reference learning rate of `3e-5` and 10 epochs. I configured the cosine schedule to have 10 cycles too, and used no warmup. This produced a WER of ~10.5%.
|
202 |
|
203 |
|
204 |
## Other things I tried
|