Very high loss compared to keras
When I use H5 weights, the loss is stable and satisfactory. However, when I switch to using HuggingFace model weights with the HuggingFace trainer, the loss begins at 50 and does not decrease below 5. I believe there is a need for a proper conversion of the model weights, as the current weights seem ineffective.
I also encountered the same problem. I have always suspected that it is my own problem...
@osanseviero Any idea if this has to do with recent fixes made about model inconsistencies?
ROPE should already have solved some of them, we are looking into the loss
I'm also experiencing the same.
Can you try on the latest transformers? We recently fixed some issues with respect to training stability which has been pushed to a patch release
Hi @tanimazsin130 , Could you try with the latest transformers release (v4.42.3) and let us know if it fixes your problem?