Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,6 @@ The training parameters are there not to ruin it - not make it better, so you do
|
|
25 |
13b can go only THAT far. There is no way you can create 100% solid finetuning on 13b. You will get close - but like with a child, sometimes it will spill a cup of milk in your lap. 33b is the way. Sadly training 33b on home hardware with 24GB is basically useless because you really have to tone down the parameters - to what I said before - basically ruining it. 48GB at least for 33b so you can crank it up.
|
26 |
|
27 |
IMHO gradient accumulation will LOWER the quality if you can do more than a few batches. There may be sweet spot somewehere, but IDK. Sure batch 1 and GA 32 will be better than batch 1 and GA 1, but that's not the point, that's a bandaid
|
28 |
-
|
29 |
Edit: It could prevent overfitting though and hence help with generalization. It depends what is the goal and how diverse the dataset is.
|
30 |
|
31 |
size of dataset matters when you are finetuning on base, but matters less when finetuning on well finetuned model. - in fact sometimes less is better in that case or you may be ruining a good previous finetuning.
|
|
|
25 |
13b can go only THAT far. There is no way you can create 100% solid finetuning on 13b. You will get close - but like with a child, sometimes it will spill a cup of milk in your lap. 33b is the way. Sadly training 33b on home hardware with 24GB is basically useless because you really have to tone down the parameters - to what I said before - basically ruining it. 48GB at least for 33b so you can crank it up.
|
26 |
|
27 |
IMHO gradient accumulation will LOWER the quality if you can do more than a few batches. There may be sweet spot somewehere, but IDK. Sure batch 1 and GA 32 will be better than batch 1 and GA 1, but that's not the point, that's a bandaid
|
|
|
28 |
Edit: It could prevent overfitting though and hence help with generalization. It depends what is the goal and how diverse the dataset is.
|
29 |
|
30 |
size of dataset matters when you are finetuning on base, but matters less when finetuning on well finetuned model. - in fact sometimes less is better in that case or you may be ruining a good previous finetuning.
|