Actually the errors cancelled eachother out
Browse filesThis is so funny the error was with non-1 batch sizes not registering correctly within my data loading script but I accidentally set the batch size to 1 on this model anyway while training, and the gradient accumulation steps were being added to themselves multiplying by two because of another bug (I meant for the bs=2 accumulation=32, got bs=1 accumulation=64) so everything kinda worked out in the end
README.md
CHANGED
@@ -11,8 +11,6 @@ tags:
|
|
11 |
|
12 |
## FLAN-OPT-6.7b-LoRA
|
13 |
|
14 |
-
<h2 style="color: green">This model is not fully trained, it needs to be retrained due to a hyperparameter error, use at your own risk (it might perform very poorly)</h2>
|
15 |
-
|
16 |
OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
|
17 |
|
18 |
This model is [facebook/opt-6.7b](https://hf.co/facebook/opt-6.7b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).
|
|
|
11 |
|
12 |
## FLAN-OPT-6.7b-LoRA
|
13 |
|
|
|
|
|
14 |
OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
|
15 |
|
16 |
This model is [facebook/opt-6.7b](https://hf.co/facebook/opt-6.7b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).
|