training script
#17
by
sugintama
- opened
In the example training script, the parameters are lr=0.001, betas=[0.9,0.999], weight_decay=0.0001. But in the released NeMo file, they are lr=1.0e-05, betas=[0.9,0.98], weight_decay=0.001, sched:=CosineAnnealing. Which set should I use?
We trained this model in two stages as mentioned here: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3#training
First stage: Trained for 150000 steps usinglr=0.001
with CosineAnnealing scheduler and warmup of 15000
Second stage: Trained for 5000 steps usinglr=1e-5
with CosineAnnealing scheduler and warmup of 0