training script

#17
by sugintama - opened

In the example training script, the parameters are lr=0.001, betas=[0.9,0.999], weight_decay=0.0001. But in the released NeMo file, they are lr=1.0e-05, betas=[0.9,0.98], weight_decay=0.001, sched:=CosineAnnealing. Which set should I use?

NVIDIA org

We trained this model in two stages as mentioned here: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3#training

First stage: Trained for 150000 steps usinglr=0.001 with CosineAnnealing scheduler and warmup of 15000
Second stage: Trained for 5000 steps usinglr=1e-5 with CosineAnnealing scheduler and warmup of 0

See: https://arxiv.org/abs/2509.14128

Sign up or log in to comment