MLP weights in each layer seem to be different
#4
by
Jeethu
- opened
Thanks for releasing the MobiLlama models! IIUC, the paper states that the FFN blocks are shared across all transformer blocks. I've verified that this in indeed the case with the 0.5B model and the MLP weights are identical in each layer, but it doesn't seem to be the case with this model. Any reason for this discrepancy?