Initialization Refactor

by MaxiBoether - opened May 15

May 15

Thank you so much for providing the Mistral implementation here for Nanotron! Unfortunately, on the latest commits of Nanotron, there seems to be a refactor on how the weights are initialized. The Trainer calls init_model_randomly now with a config object, instead of an initialization method. Do you have any plans of updating the implementation here to the latest Nanotron commit, or should we not expect that soon?

Thank you so much for the info!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment