PEFT
flan
opt
crumb commited on
Commit
0d62136
1 Parent(s): 72e905c

Actually the errors cancelled eachother out

Browse files

This is so funny the error was with non-1 batch sizes not registering correctly within my data loading script but I accidentally set the batch size to 1 on this model anyway while training, and the gradient accumulation steps were being added to themselves multiplying by two because of another bug (I meant for the bs=2 accumulation=32, got bs=1 accumulation=64) so everything kinda worked out in the end

Files changed (1) hide show
  1. README.md +0 -2
README.md CHANGED
@@ -11,8 +11,6 @@ tags:
11
 
12
  ## FLAN-OPT-6.7b-LoRA
13
 
14
- <h2 style="color: green">This model is not fully trained, it needs to be retrained due to a hyperparameter error, use at your own risk (it might perform very poorly)</h2>
15
-
16
  OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
17
 
18
  This model is [facebook/opt-6.7b](https://hf.co/facebook/opt-6.7b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).
 
11
 
12
  ## FLAN-OPT-6.7b-LoRA
13
 
 
 
14
  OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
15
 
16
  This model is [facebook/opt-6.7b](https://hf.co/facebook/opt-6.7b) finetuned with low-rank adapters (https://arxiv.org/abs/2106.09685) on the FLAN datasets (https://arxiv.org/pdf/2210.11416.pdf).