Warning about this not being scaled for pretraining with google/electra-small-discriminator?

#1
by nthngdy - opened

Hello!

I have been trying to reproduce the results for some time and the loss collapsed in each experiment. I was not careful with the size of the generator as I trusted this version to be scaled properly for the corresponding google/electra-small-discriminator model.

It turns out this model is actually scaled as the Electra-small ++ model, which explains my collapse issue.
I think it would benefit everyone and be less misleading if there was a warning of some sort explaining this in the model card. Could you please consider adding such a disclaimer?

Thank you again for sharing the weights here,
Nathan

Google org

Hey @nthngdy , thanks for your feedback! Feel free to open a PR mentioning what you think would be helpful to be less misleading, and I'd be happy to merge it in.

To be honest I am still confused about why this generator is not scaled properly, do you have any idea?

Sign up or log in to comment