Hello,
May I ask you what hyperparameters did you use to train this model? lr, number of epochs, grad accumulation steps, and how big/long was the dataset.
Thank you!
· Sign up or log in to comment