|
## Fine-tuning run 2 |
|
|
|
Tried to improve model fine-tuned during run 1. |
|
|
|
Checkpoint used: checkpoint-12000 |
|
|
|
* Learning rate picked for fine-tuning in run 2 turned out to be too small. |
|
WER did not improve compared to run 1. |
|
* Fine-tuning during run 2 followed WER trajectory of the end of run 1: |
|
from checkpoint-8000 - checkpoint-10000 |
|
* Have stopped run 2 after 3000 steps |
|
* do not upload checkpoints from that run |
|
* uploading training stdout logs and tensorboard logs |
|
|
|
## Advices |
|
|
|
* For the next fine-tuning it's better to use higher Learning Rates. |
|
As for LR Scheduler it's better to: |
|
* either use a constant Learning Rate Scheduler |
|
* or manually instantiate a LinearSchedulerWithWarmups and set `num_training_steps` to be larger |
|
than the actual number of optimization in the run, so that LR in the end would be >> 0 (much larger than 0) |
|
* need to use `seed` other than the one used during run 1. e.g. `seed=43`<br> |
|
actual seed used during train dataset reshuffling is computed as: |
|
`train_dataloader.dataset.set_epoch(train_dataloader.dataset._epoch + 1)` |
|
however, when resuming training `train_dataloader.dataset._epoch` is reset to 0.<br> |
|
thus need to provide different seed |
|
* can use original Mozilla Common Voice dataset instead of a HuggingFace's one.<br> |
|
the reason is that original contains multiple voicings of same sentence - |
|
so there is at least twice as more data.<br> |
|
to use this "additional" data, train, validation, test sets need to be enlarged using `validated` set - |
|
the one that is absent in HuggingFace's CV11 dataset |
|
|