Warning about this not being scaled for pretraining with google/electra-small-discriminator?
#1
by
nthngdy
- opened
Hello!
I have been trying to reproduce the results for some time and the loss collapsed in each experiment. I was not careful with the size of the generator as I trusted this version to be scaled properly for the corresponding google/electra-small-discriminator model.
It turns out this model is actually scaled as the Electra-small ++ model, which explains my collapse issue.
I think it would benefit everyone and be less misleading if there was a warning of some sort explaining this in the model card. Could you please consider adding such a disclaimer?
Thank you again for sharing the weights here,
Nathan
To be honest I am still confused about why this generator is not scaled properly, do you have any idea?