This works, but training does not work at all

by zokica - opened Apr 18, 2023

Apr 18, 2023

•

edited Apr 18, 2023

I tried to finetune with your script, and it does works in terms of using GPU but it does not change the output of the main model. There is something wrong there.
https://github.com/tloen/alpaca-lora

TingchenFu

Apr 27, 2023

Hi, do you figure out the reason or the solution? I come across a similar problem.

zokica

Apr 27, 2023

•

edited Apr 27, 2023

Yeah, i need to train on a larger example and more epochs. Over 3k examples and 3-5 epoch before overfits.

TingchenFu

Apr 27, 2023

Thanks for your timely reply! I guess my problem is due to the small learning rate. I set the lr to be 1e-5. I find that the losses are always 0 during the training process. I am trying a larger lr.

zokica

Apr 27, 2023

•

edited Apr 27, 2023

They guys trained with 1e-4, I think t5 3b in the main example.

So try 1-3 e-4. But i did not notice much of difference for learning rate, it finished learning pretty quickly in any case.

Did you train alpaca or something else?

TingchenFu

Apr 28, 2023

Not alpaca, but I attempt to tune a bloom-7b and it seems not working. (The output of the tuned model is exactly the same as the original one.) Hope it works with a larger lr. Or there might be some bugs in my code. I changed their code a little bit to support deepspeed.

msamogh

Jun 25, 2023

@TingchenFu @zokica Any update on either of your issues? I am facing something similar as well.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment