The Stickiness Problem
#5
by
deleted
- opened
https://huggingface.co/ehartford/WizardLM-7B-Uncensored/discussions/10
@reeducator Just linking this thread from Henk about overtraining. Some extra reading direction for model stickiness assuming you haven't been messing with dials and knobs already around these things.
Thanks @gozfarb. Our hyperparams are still default from the vicuna branch. Learning rate is 2e-5 with cosine scheduling, i.e. the LR goes orders of magnitude lower towards the end of the training. We might keep it as it is for now, unless someone points out that the model we have here suffers from similar issues. For Vicuna I can't lower it much, since the training already takes quite some time, but if necessary, there's still room for longer training periods.
deleted
changed discussion title from
The Stickiness Promblem
to The Stickiness Problem