Actually the errors cancelled eachother out

This is so funny the error was with non-1 batch sizes not registering correctly within my data loading script but I accidentally set the batch size to 1 on this model anyway while training, and the gradient accumulation steps were being added to themselves multiplying by two because of another bug (I meant for the bs=2 accumulation=32, got bs=1 accumulation=64) so everything kinda worked out in the end

Files changed (1) hide show

README.md +0 -2

README.md CHANGED Viewed

@@ -11,8 +11,6 @@ tags:
 ## FLAN-OPT-6.7b-LoRA
-<h2 style="color: green">This model is not fully trained, it needs to be retrained due to a hyperparameter error, use at your own risk (it might perform very poorly)</h2>
 OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
 This model is [facebook/opt-6.7b](https://hf.co/facebook/opt-6.7b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).

 ## FLAN-OPT-6.7b-LoRA
 OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
 This model is [facebook/opt-6.7b](https://hf.co/facebook/opt-6.7b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).