Initialization Refactor
#1
by
MaxiBoether
- opened
Thank you so much for providing the Mistral implementation here for Nanotron! Unfortunately, on the latest commits of Nanotron, there seems to be a refactor on how the weights are initialized. The Trainer
calls init_model_randomly
now with a config object, instead of an initialization method. Do you have any plans of updating the implementation here to the latest Nanotron commit, or should we not expect that soon?
Thank you so much for the info!