MLP weights in each layer seem to be different

by Jeethu - opened May 16

May 16

Thanks for releasing the MobiLlama models! IIUC, the paper states that the FFN blocks are shared across all transformer blocks. I've verified that this in indeed the case with the 0.5B model and the MLP weights are identical in each layer, but it doesn't seem to be the case with this model. Any reason for this discrepancy?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment